|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ASF JIRA | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Displaying 1000 issues at 19/Mar/20 20:35. |
| Project | Key | Summary | Issue Type | Status | Priority | Resolution | Assignee | Reporter | Creator | Created | Last Viewed | Updated | Resolved | Affects Version/s | Fix Version/s | Component/s | Due Date | Votes | Watchers | Images | Original Estimate | Remaining Estimate | Time Spent | Work Ratio | Sub-Tasks | Linked Issues | Environment | Description | Security Level | Progress | Σ Progress | Σ Time Spent | Σ Remaining Estimate | Σ Original Estimate | Labels | Git Notification Mailing List | Github Integration | Git Repository Name | Global Rank | Git Repository Type | Blog Administrator? | Blogs - Admin for blog | Blogs - Username | Blogs - Email Address | Docs Text | Git Repository Import Path | New-TLP-TLPName | Blogs - New Blog Write Access | Epic Colour | Blogs - Existing Blog Name | Enable Automatic Patch Review | Attachment count | Blog - New Blog PMC | Epic Name | Blog - New Blog Administrators | Epic Status | Blog - Write access | Epic Link | Change Category | Bug Category | Bugzilla - List of usernames | Bugzilla - PMC Name | Test and Documentation Plan | Bugzilla - Email Notification Address | Discovered By | Blogs - Existing Blog Access Level | Complexity | Bugzilla - Project Name | Severity | Initial Confluence Contributors | Space Name | Space Description | Space Key | Sprint | Rank (Obsolete) | Project | Machine Readable Info | Review Patch? | Flags | Source Control Link | Authors | Development | Reviewers | Ignite Flags | Date of First Response | Github Integrations - Other | Last public comment date | Skill Level | Affects version (Component) | Backport to Version | Fix version (Component) | Skill Level | Existing GitBox Approval | Protected Branch | GitHub Options | Release Note | Hadoop Flags | Tags | Bugzilla Id | Level of effort | Target Version/s | Bug behavior facts | Lucene Fields | Github Integration - Triggers | Workaround | Bugzilla Id | INFRA - Subversion Repository Path | Testcase included | Estimated Complexity | Regression | Review Date | Evidence Of Use On World Wide Web | Evidence Of Registration | Epic/Theme | Flagged | External issue ID | Priority | Reproduced In | Tags | Since Version | Reviewer | External issue URL | Hadoop Flags | Issue & fix info | Evidence Of Open Source Adoption | Rank | Severity | Tester |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ZooKeeper | ZOOKEEPER-1742 | "make check" doesn't work on macos |
Bug | Open | Major | Unresolved | Michael Han | Flavio Paiva Junqueira | Flavio Paiva Junqueira | 21/Aug/13 11:51 | 05/Feb/20 07:16 | 3.4.5, 3.5.0 | 3.7.0, 3.5.8 | 0 | 6 | ZOOKEEPER-1795, ZOOKEEPER-1646, ZOOKEEPER-1077, ZOOKEEPER-2505 | There are two problems I have spotted when running "make check" with the C client. First, it complains that the sleep call is not defined in two test files: tests/ZooKeeperQuorumServer.cc and tests/TestReconfigServer.cc. Including unistd.h works. The second problem is with linker options. It complains that "--wrap" is not a valid. I'm not sure how to deal with this one yet, since I'm not sure why we are using it. | 344801 | No Perforce job exists for this issue. | 11 | 345101 | 3 years, 32 weeks, 5 days ago | 0|i1ngkn: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1741 | bin scripts don't dereference symlinks |
Bug | Resolved | Trivial | Duplicate | Max Lapan | Max Lapan | Max Lapan | 16/Aug/13 08:46 | 02/Oct/13 13:02 | 02/Oct/13 13:02 | 3.4.5 | scripts | 0 | 1 | Centos 5.8 | Symlinks on bin scripts are not dereferenced correctly ("set -x" added): {noformat} [root@tsthdp1 noarch]# which zookeeper-client /usr/local/bin/zookeeper-client [root@tsthdp1 noarch]# ls -la /usr/local/bin/zookeeper-client lrwxrwxrwx 1 root root 40 Авг 16 15:56 /usr/local/bin/zookeeper-client -> /usr/local/hadoop/zookeeper/bin/zkCli.sh [root@tsthdp1 noarch]# ls -la /usr/local/hadoop/zookeeper/bin итого 36 drwxr-xr-x 2 root root 4096 Авг 16 16:24 . drwxr-xr-x 5 root root 4096 Авг 16 15:56 .. -rwxr-xr-x 1 root root 1909 Авг 16 15:56 zkCleanup.sh -rwxr-xr-x 1 root root 1536 Авг 16 16:22 zkCli.sh -rwxr-xr-x 1 root root 2599 Авг 16 15:56 zkEnv.sh -rwxr-xr-x 1 root root 4559 Авг 16 15:56 zkServer-initialize.sh -rwxr-xr-x 1 root root 6246 Авг 16 15:56 zkServer.sh [root@tsthdp1 noarch]# zookeeper-client + ZOOBIN=/usr/local/bin/zookeeper-client ++ dirname /usr/local/bin/zookeeper-client + ZOOBIN=/usr/local/bin ++ cd /usr/local/bin ++ pwd + ZOOBINDIR=/usr/local/bin + '[' -e /usr/local/bin/../libexec/zkEnv.sh ']' + . /usr/local/bin/zkEnv.sh /usr/local/bin/zookeeper-client: line 37: /usr/local/bin/zkEnv.sh: no such file or directory {noformat} |
344055 | No Perforce job exists for this issue. | 1 | 344357 | 6 years, 25 weeks, 1 day ago | 0|i1nbzz: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1740 | Zookeeper 3.3.4 loses ephemeral nodes under stress |
Bug | Resolved | Critical | Fixed | Flavio Paiva Junqueira | Neha Narkhede | Neha Narkhede | 15/Aug/13 14:18 | 06/Feb/16 23:15 | 06/Feb/16 23:15 | 3.3.4 | server | 3 | 10 | KAFKA-1387, ZOOKEEPER-1809 | The current behavior of zookeeper for ephemeral nodes is that session expiration and ephemeral node deletion is not an atomic operation. The side-effect of the above zookeeper behavior in Kafka, for certain corner cases, is that ephemeral nodes can be lost even if the session is not expired. The sequence of events that can lead to lossy ephemeral nodes is as follows - 1. The session expires on the client, it assumes the ephemeral nodes are deleted, so it establishes a new session with zookeeper and tries to re-create the ephemeral nodes. 2. However, when it tries to re-create the ephemeral node,zookeeper throws back a NodeExists error code. Now this is legitimate during a session disconnect event (since zkclient automatically retries the operation and raises a NodeExists error). Also by design, Kafka server doesn't have multiple zookeeper clients create the same ephemeral node, so Kafka server assumes the NodeExists is normal. 3. However, after a few seconds zookeeper deletes that ephemeral node. So from the client's perspective, even though the client has a new valid session, its ephemeral node is gone. This behavior is triggered due to very long fsync operations on the zookeeper leader. When the leader wakes up from such a long fsync operation, it has several sessions to expire. And the time between the session expiration and the ephemeral node deletion is magnified. Between these 2 operations, a zookeeper client can issue a ephemeral node creation operation, that could've appeared to have succeeded, but the leader later deletes the ephemeral node leading to permanent ephemeral node loss from the client's perspective. Thread from zookeeper mailing list: http://zookeeper.markmail.org/search/?q=Zookeeper+3.3.4#query:Zookeeper%203.3.4%20date%3A201307%20+page:1+mid:zma242a2qgp6gxvx+state:results The way to reproduce this behavior is as follows - 1. Bring up a zookeeper 3.3.4 cluster and create several sessions with ephemeral ndoes on it using zkclient. Make sure the session expiration callback is implemented and it re-registers the ephemeral node. 2. Run the following script on the zookeeper leader - while true do kill -STOP $1 sleep 8 kill -CONT $1 sleep 60 done 3. Run another script to check for existence of ephemeral nodes. This script shows that zookeeper loses the ephemeral nodes and the clients still have a valid session. |
343898 | No Perforce job exists for this issue. | 0 | 344200 | 4 years, 6 weeks, 4 days ago | 0|i1nb13: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1739 | thread safe bug in FastLeaderElection: instance of WorkerSender is not safe published, WorkerSender thread may see that WorkerSender.manager is the default value null |
Bug | Open | Minor | Unresolved | qingjie qiao | qingjie qiao | qingjie qiao | 09/Aug/13 00:29 | 10/Aug/13 23:01 | 3.4.5 | leaderElection | 0 | 4 | I am reading the trunk source code recently and find a thread-safe problem, but i'm not quite sure. in FastLeaderElection: {code} class WorkerSender implements Runnable { volatile boolean stop; QuorumCnxManager manager; WorkerSender(QuorumCnxManager manager){ this.stop = false; this.manager = manager; } public void run() { ... } } ... Messenger(QuorumCnxManager manager) { this.ws = new WorkerSender(manager); Thread t = new Thread(this.ws, "WorkerSender[myid=" + self.getId() + "]"); t.setDaemon(true); t.start(); this.wr = new WorkerReceiver(manager); t = new Thread(this.wr, "WorkerReceiver[myid=" + self.getId() + "]"); t.setDaemon(true); t.start(); } ... {code} The instance of WorkerSender is constructed in main thread, and its field manager is assigned , and it is used in another thread. The later thread may see that WorkerSender.manager is the default value null. The solution may be: (1) change {code} WorkerSender(QuorumCnxManager manager){ this.stop = false; this.manager = manager; } {code} to {code} WorkerSender(QuorumCnxManager manager){ this.manager = manager; this.stop = false; } {code} or(2) change {code} QuorumCnxManager manager; {code} to {code} final QuorumCnxManager manager; {code} |
342783 | No Perforce job exists for this issue. | 1 | 343087 | 6 years, 32 weeks, 4 days ago | 0|i1n45r: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1738 | Xid out of order from a 3.4.5 client to a 3.3.5 cluster |
Bug | Resolved | Major | Invalid | Unassigned | Vincent Bernat | Vincent Bernat | 07/Aug/13 10:43 | 24/Oct/13 01:41 | 24/Oct/13 01:41 | 3.3.5 | 0 | 1 | Server: zookeeper 3.3.5+dfsg1-1ubuntu1 Client: zookeeper 3.4.5 from Cloudera 4.3.0 |
This happens in the context of HBase master nodes getting connections from HBase region server. Once an HBase region server joins the cluster, I get the following error: {code} 2013-08-07 13:35:18,676 WARN org.apache.zookeeper.ClientCnxn: Session 0xd4058c4d7940003 for server zk-01.dev.dailymotion.com/10.194.60.13:2181, unexpected error, closing socket connection and attempting reconnect java.io.IOException: Xid out of order. Got Xid 56 with err -101 expected Xid 55 for a packet with details: clientPath:null serverPath:null finished:false header:: 55,14 replyHeader:: 0,0,-4 request:: org.apache.zookeeper.MultiTransactionRecord@360193e5 response:: org.apache.zookeeper.MultiResponse@0 at org.apache.zookeeper.ClientCnxn$SendThread.readResponse(ClientCnxn.java:795) at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:94) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:355) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068) 2013-08-07 13:35:18,676 WARN org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper exception: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss 2013-08-07 13:35:18,676 ERROR org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: ZooKeeper multi failed after 3 retries 2013-08-07 13:35:18,677 ERROR org.apache.hadoop.hbase.master.AssignmentManager: Unable to ensure that the table -ROOT- will be enabled because of a ZooKeeper issue 2013-08-07 13:35:18,677 FATAL org.apache.hadoop.hbase.master.HMaster: Master server abort: loaded coprocessors are: [] 2013-08-07 13:35:18,677 FATAL org.apache.hadoop.hbase.master.HMaster: Unable to ensure that the table -ROOT- will be enabled because of a ZooKeeper issue org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:931) at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:911) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.multi(RecoverableZooKeeper.java:531) at org.apache.hadoop.hbase.zookeeper.ZKUtil.multiOrSequential(ZKUtil.java:1440) at org.apache.hadoop.hbase.zookeeper.ZKTable.setTableState(ZKTable.java:245) at org.apache.hadoop.hbase.zookeeper.ZKTable.setEnabledTable(ZKTable.java:325) at org.apache.hadoop.hbase.master.AssignmentManager.setEnabledTable(AssignmentManager.java:3576) at org.apache.hadoop.hbase.master.AssignmentManager.setEnabledTable(AssignmentManager.java:2340) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1674) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1424) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1399) at org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1394) at org.apache.hadoop.hbase.master.handler.ClosedRegionHandler.process(ClosedRegionHandler.java:105) at org.apache.hadoop.hbase.master.AssignmentManager.addToRITandCallClose(AssignmentManager.java:675) at org.apache.hadoop.hbase.master.AssignmentManager.processRegionsInTransition(AssignmentManager.java:586) at org.apache.hadoop.hbase.master.AssignmentManager.processRegionInTransition(AssignmentManager.java:525) at org.apache.hadoop.hbase.master.AssignmentManager.processRegionInTransitionAndBlockUntilAssigned(AssignmentManager.java:489) at org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(HMaster.java:679) at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:583) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:395) at java.lang.Thread.run(Thread.java:722) 2013-08-07 13:35:18,678 INFO org.apache.hadoop.hbase.master.HMaster: Aborting 2013-08-07 13:35:18,678 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Server stopped; skipping assign of -ROOT-,,0.70236052 state=OFFLINE, ts=1375881792131, server=null 2013-08-07 13:35:18,678 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Waiting on 70236052/-ROOT- 2013-08-07 13:35:18,678 INFO org.apache.hadoop.hbase.master.AssignmentManager$TimeoutMonitor: masternode-01.dev.dailymotion.com,60000,1375880747185.timeoutMonitor exiting 2013-08-07 13:35:18,679 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=M_ZK_REGION_OFFLINE, server=masternode-01.dev.dailymotion.com,60000,1375880747185, region=70236052/-ROOT-, which is more than 15 seconds late 2013-08-07 13:35:18,776 WARN org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper exception: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/root-region-server 2013-08-07 13:35:18,776 WARN org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper exception: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase 2013-08-07 13:35:18,776 ERROR org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: ZooKeeper getData failed after 3 retries 2013-08-07 13:35:18,777 INFO org.apache.hadoop.hbase.util.RetryCounter: Sleeping 2000ms before retry #1... {code} |
342399 | No Perforce job exists for this issue. | 0 | 342704 | 6 years, 22 weeks ago | 0|i1n1t3: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1737 | zk scripts no longer work when symlinked |
Bug | Patch Available | Major | Unresolved | Chris Seawood | Chris Seawood | Chris Seawood | 05/Aug/13 15:12 | 30/Sep/13 20:14 | 3.4.5 | scripts | 0 | 1 | RHEL6.4 | At some point since 3.3, the shell scripts were updated to move away from using readlink to using BASH_SOURCE. The problem is that BASH_SOURCE doesn't resolve symlinks so when /usr/bin/zookeeper-cli is symlinked to /usr/lib/zookeeper/bin/zkCli.sh it fails every single time. |
341952 | No Perforce job exists for this issue. | 1 | 342258 | 6 years, 25 weeks, 3 days ago | 0|i1mz27: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1736 | Zookeeper SASL authentication allows anonymus users to log in |
Bug | Resolved | Major | Not A Problem | Unassigned | AntonioS | AntonioS | 26/Jul/13 04:40 | 19/Mar/19 08:58 | 10/Oct/13 13:43 | server | 0 | 7 | Development | Hello. I have configured Zookeeper to provide SASL authentication, using ordinary username and password stored in the JAAS.conf as a DigestLoginModule I have created a simple jaas.conf file: Server { org.apache.zookeeper.server.auth.DigestLoginModule required user_admin="admin"; }; Client { org.apache.zookeeper.server.auth.DigestLoginModule required username="admin" password="admin"; }; I have the zoo.cfg correctly configured for security, adding the following: requireClientAuthScheme=sasl authProvider.1=org.apache.zookeeper.server.auth.SASLAuthenticationProvider jaasLoginRenew=3600000 zookeeper.allowSaslFailedClients=false And I also have the java.env file: export JVMFLAGS="-Djava.security.auth.login.config=/etc/zookeeper/conf/jaas.conf -Dzookeeper.allowSaslFailedClients=false" Everything looks good. If I put the right username and password I authenticate, otherwise not and I get an exception. The problem is when I don’t put any username and password at all, zookeeper allows me to go through. I tried different things but nothing stops anonymous users to log in. I was looking at the source code, in particular the ZookeeperServer.java, this method: public void processPacket(ServerCnxn cnxn, ByteBuffer incomingBuffer) throws IOException { The section below: } else { if (h.getType() == OpCode.sasl) { Record rsp = processSasl(incomingBuffer,cnxn); ReplyHeader rh = new ReplyHeader(h.getXid(), 0, KeeperException.Code.OK.intValue()); cnxn.sendResponse(rh,rsp, "response"); // not sure about 3rd arg..what is it? } else { Request si = new Request(cnxn, cnxn.getSessionId(), h.getXid(), h.getType(), incomingBuffer, cnxn.getAuthInfo()); si.setOwner(ServerCnxn.me); submitRequest(si); } } The else flow appears to just forward any anonymous request to the handler, without attempting any authentication. Is this a bug? Is there any way to stop anonymous users connecting to Zookeeper? Thanks Antonio |
ssl-tls | 340184 | No Perforce job exists for this issue. | 0 | 340502 | 1 year, 13 weeks, 1 day ago | 0|i1mo8n: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1735 | ZOOKEEPER-1722 Make ZooKeeper easier to test - support simulating a connection loss |
Sub-task | Open | Major | Unresolved | Unassigned | Jordan Zimmerman | Jordan Zimmerman | 22/Jul/13 17:51 | 13/Aug/13 03:28 | java client | 0 | 1 | ZOOKEEPER-1730 | As part of making ZooKeeper clients more test friendly, it would be useful to easily simulate a connection loss event | 339399 | No Perforce job exists for this issue. | 0 | 339719 | 6 years, 35 weeks, 3 days ago | 0|i1mjev: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1734 | Zookeeper fails to connect if one zookeeper host is down on EC2 when using elastic IP (UnknownHostException) |
Bug | Open | Major | Unresolved | Unassigned | Andy Grove | Andy Grove | 22/Jul/13 16:28 | 15/Jul/14 13:22 | 3.4.5 | java client | 1 | 7 | ZOOKEEPER-1576 | Amazon EC2. Linux. | We use Amazon Elastic IP for zookeeper hosts so that the zookeeper hosts have the same IP address after a restart. The issue is, if one host is down then we cannot connect to the other hosts. Here is an example connect string: "ec2-1-2-3-4.compute-1.amazonaws.com, ec2-4-3-2-1.compute-1.amazonaws.com, ec2-5-5-5-5.compute-1.amazonaws.com" If all three hosts are up, we can connect. If one host is down, then we cannot create a Zookeeper instance due to an UnknownHost exception, even though the other servers in the connect string are valid. java.net.UnknownHostException: ec2-5-5-5-5.compute-1.amazonaws.com at java.net.InetAddress.getAllByName0(InetAddress.java:1243) at java.net.InetAddress.getAllByName(InetAddress.java:1155) at java.net.InetAddress.getAllByName(InetAddress.java:1091) at org.apache.zookeeper.client.StaticHostProvider.<init>(StaticHostProvider.java:60) at org.apache.zookeeper.ZooKeeper.<init>(ZooKeeper.java:445) |
339383 | No Perforce job exists for this issue. | 0 | 339703 | 5 years, 36 weeks, 2 days ago | 0|i1mjbb: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1733 | FLETest#testLE is flaky on windows boxes |
Bug | Closed | Major | Fixed | Jeffrey Zhong | Jeffrey Zhong | Jeffrey Zhong | 19/Jul/13 20:31 | 13/Mar/14 14:17 | 18/Dec/13 10:48 | 3.4.5 | 3.4.6, 3.5.0 | 0 | 6 | ZOOKEEPER-1845 | FLETest#testLE fail intermittently on windows boxes. The reason is that in LEThread#run() we have: {code} if(leader == i){ synchronized(finalObj){ successCount++; if(successCount > (count/2)) finalObj.notify(); } break; } {code} Basically once we have a confirmed leader, the leader thread dies due to the "break" of while loop. While in the verification step, we check if the leader thread alive or not as following: {code} if(threads.get((int) leader).isAlive()){ Assert.fail("Leader hasn't joined: " + leader); } {code} On windows boxes, the above verification step fails frequently because leader thread most likely already exits. Do we know why we have the leader alive verification step only lead thread can bump up successCount >= count/2? |
339050 | No Perforce job exists for this issue. | 3 | 339370 | 6 years, 2 weeks ago |
Reviewed
|
0|i1mh9b: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1732 | ZooKeeper server unable to join established ensemble |
Bug | Closed | Blocker | Fixed | Germán Blanco | Germán Blanco | Germán Blanco | 19/Jul/13 12:14 | 13/Mar/14 14:17 | 29/Oct/13 23:22 | 3.4.5 | 3.4.6, 3.5.0 | leaderElection | 0 | 12 | ZOOKEEPER-1805 | Windows 7, Java 1.7 | I have a test in which I do a rolling restart of three ZooKeeper servers and it was failing from time to time. I ran the tests in a loop until the failure came out and it seems that at some point one of the servers is unable to join the enssemble formed by the other two. |
338972 | No Perforce job exists for this issue. | 13 | 339292 | 6 years, 2 weeks ago |
Reviewed
|
0|i1mgrz: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1731 | Unsynchronized access to ServerCnxnFactory.connectionBeans results in deadlock |
Bug | Closed | Critical | Fixed | Dave Latham | Dave Latham | Dave Latham | 16/Jul/13 13:59 | 02/Mar/16 20:33 | 02/Aug/13 13:45 | 3.4.6 | 0 | 8 | We had a cluster of 3 peers (running 3.4.3) fail after we took down 1 peer briefly for maintenance. A second peer became unresponsive and the leader lost quorum. Thread dumps on the second peer showed two threads consistently stuck in these states: {noformat} "QuorumPeer[myid=0]/0.0.0.0:2181" prio=10 tid=0x00002aaab8d20800 nid=0x598a runnable [0x000000004335d000] java.lang.Thread.State: RUNNABLE at java.util.HashMap.put(HashMap.java:405) at org.apache.zookeeper.server.ServerCnxnFactory.registerConnection(ServerCnxnFactory.java:131) at org.apache.zookeeper.server.ZooKeeperServer.finishSessionInit(ZooKeeperServer.java:572) at org.apache.zookeeper.server.quorum.Learner.revalidate(Learner.java:444) at org.apache.zookeeper.server.quorum.Follower.processPacket(Follower.java:133) at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:86) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:740) "NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181" daemon prio=10 tid=0x00002aaab84b0800 nid=0x5986 runnable [0x0000000040878000] java.lang.Thread.State: RUNNABLE at java.util.HashMap.removeEntryForKey(HashMap.java:614) at java.util.HashMap.remove(HashMap.java:581) at org.apache.zookeeper.server.ServerCnxnFactory.unregisterConnection(ServerCnxnFactory.java:120) at org.apache.zookeeper.server.NIOServerCnxn.close(NIOServerCnxn.java:971) - locked <0x000000078d8a51f0> (a java.util.HashSet) at org.apache.zookeeper.server.NIOServerCnxnFactory.closeSessionWithoutWakeup(NIOServerCnxnFactory.java:307) at org.apache.zookeeper.server.NIOServerCnxnFactory.closeSession(NIOServerCnxnFactory.java:294) - locked <0x000000078d82c750> (a org.apache.zookeeper.server.NIOServerCnxnFactory) at org.apache.zookeeper.server.ZooKeeperServer.processConnectRequest(ZooKeeperServer.java:834) at org.apache.zookeeper.server.NIOServerCnxn.readConnectRequest(NIOServerCnxn.java:410) at org.apache.zookeeper.server.NIOServerCnxn.readPayload(NIOServerCnxn.java:200) at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:236) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:224) at java.lang.Thread.run(Thread.java:662) {noformat} It shows both threads concurrently modifying ServerCnxnFactory.connectionBeans which is a java.util.HashMap. This cluster was serving thousands of clients, which seems to make this condition more likely as it appears to occur when one client connects and another disconnects at about the same time. |
338260 | No Perforce job exists for this issue. | 1 | 338580 | 6 years, 2 weeks ago | 0|i1mcdz: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1730 | ZOOKEEPER-1722 Make ZooKeeper easier to test - support simulating a session expiration |
Sub-task | Resolved | Major | Fixed | Jordan Zimmerman | Jordan Zimmerman | Jordan Zimmerman | 15/Jul/13 23:37 | 01/Apr/14 07:10 | 31/Mar/14 22:03 | 3.5.0 | java client | 0 | 4 | ZOOKEEPER-1735 | As part of making ZooKeeper clients more test friendly, it would be useful to easily simulate a session loss event | 338117 | No Perforce job exists for this issue. | 2 | 338438 | 5 years, 51 weeks, 2 days ago | 0|i1mbif: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1729 | Add l4w command "snap" to trigger log rotation and snapshotting |
Improvement | Open | Minor | Unresolved | Thawan Kooburat | Thawan Kooburat | Thawan Kooburat | 15/Jul/13 23:31 | 17/Feb/17 10:08 | server | 0 | 2 | ZOOKEEPER-2700, ZOOKEEPER-1346 | "snap" command can be used to trigger log rotate and snapshotting on each server. One use case for this command is to make server restart faster by issuing snap command before restarting the server. This help when txnlog is large (due to txn size or number of txn) snap is a blocking command, it will return when snapshot is written to disk. So it is safe to call this prior to restarting the server. |
338116 | No Perforce job exists for this issue. | 0 | 338437 | 6 years, 24 weeks, 2 days ago | 0|i1mbi7: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1728 | Better error message when reconfig invoked in standalone mode |
Improvement | Resolved | Minor | Fixed | Alexander Shraer | Alexander Shraer | Alexander Shraer | 13/Jul/13 18:02 | 01/Apr/14 07:10 | 31/Mar/14 19:47 | 3.5.0 | 3.5.0 | 0 | 4 | For now reconfig is not supported in standalone mode. But when invoked it should return something better than the current ClassCast exception. The patch throws a KeeperException.UnimplementedException in this case (most errors are reported through exceptions). |
337829 | No Perforce job exists for this issue. | 2 | 338151 | 5 years, 51 weeks, 2 days ago | 0|i1m9qn: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1727 | Doc request: The right way to expand a cluster |
Wish | Resolved | Minor | Duplicate | Alexander Shraer | Justin SB | Justin SB | 12/Jul/13 21:31 | 13/Jul/13 15:58 | 13/Jul/13 15:43 | 3.5.0 | 3.5.0 | 0 | 4 | ZOOKEEPER-1660 | When expanding a cluster from 2->3, if ZK server #3 isn't up yet, then it seems that the reconfig request times out with a connection-loss error. The configuration is updated though. So we could wait, reconnect, and then refetch the config to make sure we did join the quorum, though that seems a little bit hacky! What is correct way to do this (and cluster growth in general)? Should we bring up new ZK servers before issuing the reconfig command? What is the right way to bring up new ZK servers (connect as a client, request the config, save the config to the zk.conf.dynamic file, add our new server line to the new zk.conf.dynamic file, start the new server, call reconfig as a client to the existing cluster)? Is this documented anywhere? (Just the steps to do it "the right way" would be great, no need for actual code) :-) |
337777 | No Perforce job exists for this issue. | 0 | 338099 | 6 years, 36 weeks, 5 days ago | 0|i1m9f3: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1726 | No way to dynamically go from 1 ZK server -> 2 ZK servers? |
Bug | Resolved | Major | Duplicate | Unassigned | Justin SB | Justin SB | 12/Jul/13 20:45 | 12/Jul/13 20:53 | 12/Jul/13 20:53 | 3.5.0 | 0 | 2 | The dynamic reconfiguration feature is great. But it doesn't seem to be possible to go from 1 server to 2 servers (1 server + 1 observer). When there's only one server, ZK automatically starts in single server mode; when in single server mode trying to add a server causes a class cast exception because the server is a ZooKeeperServer, not a LeaderZooKeeperServer. | 337774 | No Perforce job exists for this issue. | 0 | 338096 | 6 years, 36 weeks, 5 days ago | 0|i1m9ef: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1725 | Zookeeper Dynamic Conf writes out hostnames when IPs are supplied |
Bug | Resolved | Minor | Fixed | Michi Mutsuzaki | Justin SB | Justin SB | 12/Jul/13 20:43 | 01/Apr/14 07:10 | 31/Mar/14 19:57 | 3.5.0 | 3.5.0 | 0 | 5 | When writing the dynamic configuration out, Zookeeper writes out hostnames, even if an IP address is supplied. These may not correctly round-trip (e.g. 127.0.0.1 might be written as localhost which may then resolve to 127.0.0.1 and another IP address). This isn't actually causing problems for me right now, but seems very likely to cause hard-to-track-down problems in future. |
337773 | No Perforce job exists for this issue. | 3 | 338095 | 5 years, 51 weeks, 2 days ago | 0|i1m9e7: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1724 | Support Kerberos authentication for non-SUN JDK |
Improvement | Open | Major | Unresolved | Bing Li | Bing Li | Bing Li | 08/Jul/13 21:23 | 05/Feb/20 07:16 | 3.4.5, 3.4.6, 3.5.0 | 3.7.0, 3.5.8 | 1 | 3 | Current class Login only support running with SUN JDK when Kerberos is enabled. In order to support alternative JDKs like IBM JDK which has different options supported by Krb5LoginModule, class Login should be changed. | 336992 | No Perforce job exists for this issue. | 1 | 337315 | 5 years, 39 weeks ago | Support Kerberos authentication for non-SUN JDK | 0|i1m4lb: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1723 | unique ensemble identifier |
Bug | Open | Major | Unresolved | Unassigned | Mohammad Shamma | Mohammad Shamma | 08/Jul/13 16:46 | 08/Jul/13 16:46 | server | 0 | 1 | Zookeeper ensembles need an identifier that would prevent misconfigured zookeeper server from clobbering the configuration of a zookeeper ensemble. Use case: - A zookeeper based distributed system that grows its zookeeper ensemble incrementally. - The system is reset, where the new zookeeper ensemble is a subset of the old zookeeper ensemble (the history of the new ensemble have been reset too). - The old zookeeper servers will attempt to communicate with the new servers (assuming the network end-points remain the same). - The new zookeeper servers will notice that the old zookeeper servers have a higher configuration version and will attempt to reconfigure based on the old ensemble configuration info. Note that this can be solved if the reset process would stop every zookeeper server in the old deployment and delete its history. However, some of these servers might be down at the time of reset, therefore this solution is not reliable. I am sure this is not the most generic description of the problem of not having ensemble identifiers, but it presents a use case for introducing them to prevent servers from cross-talking across different ensembles. Otherwise they will automatically join in to form a single ensemble. |
336902 | No Perforce job exists for this issue. | 0 | 337225 | 6 years, 37 weeks, 3 days ago | 0|i1m41b: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1722 | Make ZooKeeper clients more test friendly |
Improvement | Open | Major | Unresolved | Unassigned | Thawan Kooburat | Thawan Kooburat | 08/Jul/13 13:57 | 15/Jul/13 23:45 | c client, java client | 0 | 4 | ZOOKEEPER-1730, ZOOKEEPER-1735 | We should be able to expose a few more API calls that allow user write unit tests that cover various failure scenarios (similar to the TestableZookeer in zookeeper test). This should also minimize the effort on setting the test framework for application developer Here is some example calls that we should provide. 1. zookeeper_close() that don't actually send close request to the server: This can be used to simulate a client crash without actually crashing the test program. 2. Allow client to trigger CONNECTION_LOSS or SESSSION_EXPIRE event: This will allow the user to test their watchers and callback (and possible race condition) |
336864 | No Perforce job exists for this issue. | 0 | 337187 | 6 years, 36 weeks, 2 days ago | 0|i1m3sv: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1721 | Ability to run without writing to disk |
New Feature | Open | Major | Unresolved | Unassigned | Radim Kolar | Radim Kolar | 06/Jul/13 11:51 | 09/Jul/13 11:46 | 3.4.5 | server | 0 | 2 | I use zookeeper for cluster synchronization. We have no need for keeping persistent state across zookeeper restarts. For performance enhancement would be good to have possibility to run without writing snapshots and logs. | 336711 | No Perforce job exists for this issue. | 0 | 337034 | 6 years, 37 weeks, 2 days ago | 0|i1m2uv: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1720 | Race in zookeeper_close() leads to hang |
Bug | Open | Major | Unresolved | Unassigned | Kevin Jamieson | Kevin Jamieson | 05/Jul/13 23:34 | 17/Oct/17 08:14 | 3.5.0 | c client | 1 | 4 | Ubuntu 12.04.1 | Using ZK 3.5.4, zookeeper_close() occasionally hangs with a backtrace of the form: {noformat} #0 0x00002b255fab489c in __lll_lock_wait () from /lib/x86_64-linux-gnu/libpthread.so.0 #1 0x00002b255fab26b0 in pthread_cond_broadcast@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0 #2 0x00002b2560568ced in unlock_completion_list (l=0x13f5430) at src/mt_adaptor.c:69 #3 0x00002b256055b9ec in free_completions (zh=0x13f5270, callCompletion=1, reason=-116) at src/zookeeper.c:1521 #4 0x00002b256055d3bc in zookeeper_close (zh=0x13f5270) at src/zookeeper.c:2954 {noformat} At which point the zhandle_t struct appears to have already been freed, as it contains garbage: {noformat} (gdb) p zh->sent_requests.cond $19 = { __data = { __lock = 2, __futex = 0, __total_seq = 18446744073709551615, __wakeup_seq = 0, __woken_seq = 0, __mutex = 0x0, __nwaiters = 0, __broadcast_seq = 0 }, __size = "\002\000\000\000\000\000\000\000\377\377\377\377\377\377\377\377", '\000' <repeats 31 times>, __align = 2 } {noformat} There appears to be a race condition in the following code: {noformat} int api_epilog(zhandle_t *zh,int rc) { if(inc_ref_counter(zh,-1)==0 && zh->close_requested!=0) zookeeper_close(zh); return rc; } int zookeeper_close(zhandle_t *zh) { int rc=ZOK; if (zh==0) return ZBADARGUMENTS; zh->close_requested=1; if (inc_ref_counter(zh,1)>1) { {noformat} As api_epilog() may free zh in between zookeeper_close() setting zh->close_requested=1 and incrementing the reference count. The following patch should fix the problem: {noformat} diff --git a/src/c/src/zookeeper.c b/src/c/src/zookeeper.c index 6943243..61a263a 100644 --- a/src/c/src/zookeeper.c +++ b/src/c/src/zookeeper.c @@ -1051,6 +1051,7 @@ zhandle_t *zookeeper_init(const char *host, watcher_fn watcher, goto abort; } + api_prolog(zh); return zh; abort: errnosave=errno; @@ -2889,7 +2890,7 @@ int zookeeper_close(zhandle_t *zh) return ZBADARGUMENTS; zh->close_requested=1; - if (inc_ref_counter(zh,1)>1) { + if (inc_ref_counter(zh,0)>1) { /* We have incremented the ref counter to prevent the * completions from calling zookeeper_close before we have * completed the adaptor_finish call below. */ {noformat} |
336664 | No Perforce job exists for this issue. | 0 | 336987 | 2 years, 22 weeks, 2 days ago | 0|i1m2kf: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1719 | zkCli.sh, zkServer.sh and zkEnv.sh regression caused by ZOOKEEPER-1663 |
Bug | Closed | Major | Fixed | Marshall McMullen | Marshall McMullen | Marshall McMullen | 25/Jun/13 14:00 | 25/Jul/14 07:25 | 28/Jun/13 12:28 | 3.4.5, 3.5.0 | 3.4.6, 3.5.0 | 0 | 5 | Linux (Ubuntu 12.04) with dash shell | This fix from ZOOKEEPER-1663 is incorrect. It assumes the shell is bash since it uses bash array construction, e.g.: {code} 96 LIBPATH=("${ZOOKEEPER_PREFIX}"/share/zookeeper/*.jar) {code} This does NOT work if /bin/sh points to /bin/dash as it does on Ubuntu. It fails as so: {quote} zkEnv.sh: 96: zkEnv.sh: Syntax error: "(" unexpected (expecting "fi") {quote} If I change the shebang at the top to use "/bin/bash" instead of "/bin/sh" it works as expected. I don't know the full details of why using a bash array was chosen as the solution but I don't think it is the right way to deal with spaces in these paths... |
335045 | No Perforce job exists for this issue. | 1 | 335369 | 5 years, 34 weeks, 6 days ago | 0|i1lslj: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1718 | Support JLine 2 |
Test | Resolved | Critical | Fixed | Manikumar | Christopher Tubbs | Christopher Tubbs | 19/Jun/13 13:29 | 18/Nov/14 19:56 | 01/Oct/13 17:19 | 3.5.0 | 1 | 8 | ACCUMULO-1510, ZOOKEEPER-1655, ZOOKEEPER-1773, YARN-2815, ZOOKEEPER-2085 | not fixed | 334033 | No Perforce job exists for this issue. | 2 | 334359 | 6 years, 25 weeks, 1 day ago | JLine upgraded to version 2.11 |
Reviewed
|
0|i1lmef: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1717 | Flex code works in debug mode , not in run mode |
Bug | Open | Major | Unresolved | Unassigned | hareesh | hareesh | 14/Jun/13 11:22 | 01/Sep/13 09:27 | 4.0.0 | 4.0.0 | 0 | 2 | 43200 | 43200 | 0% | In my flex application , when i debug the code it's giving the correct result. but , if i run in run mode it's not giving correct result. I tried to know what's happening .but , i didn't get anything. Could you please give me some suggestions ? |
0% | 0% | 43200 | 43200 | 333224 | No Perforce job exists for this issue. | 0 | 333552 | 6 years, 29 weeks, 4 days ago |
Incompatible change, Reviewed
|
0|i1lhf3: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1716 | jute/Utils.fromCSVBuffer cannot parse data returnd by toCSVBuffer |
Bug | Patch Available | Major | Unresolved | Charlie Helin | Robert Joseph Evans | Robert Joseph Evans | 11/Jun/13 17:48 | 14/Oct/15 17:01 | 3.5.0 | jute | 0 | 2 | I was trying to use org.apache.zookeeper.server.LogFormatter to analyze the access pattern of a particular application. As part of this I wanted to get the size of the data that was being written into ZK. I ran into an issue where in some cases the hex data had an odd length. I looked into it and found that the buffer is being written out using Integer.toHexString(barr[idx]) Looking at the javadoce for toHexString it indicates that it does not pad the bits at all, and will output the twos compliment of the number if it is negative. I then looked at how the data was being parsed and it assumed that every byte consisted of exactly two characters, which is not true. {code} Utils.toCSVBuffer(new byte[] {0xff}) returns "#ffffffff" Utils.toCSVBuffer(new byte[] {0x01}) returns "#1" If I combine those Utils.fromCSVBuffer(Utils.toCSVBuffer(new byte[] {0xff, 0x01, 0xff})) will return {0xff, 0xff, 0xff, 0xff, 0x1f, 0xff, 0xff, 0xff} {code} I think what we want is something like {code} static final char[] NIBBLE_TO_HEX = { '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'a', 'b', 'c', 'd', 'e', 'f' }; static String toCSVBuffer(byte barr[]) { if (barr == null || barr.length == 0) { return ""; } StringBuilder sb = new StringBuilder(barr.length + 1); sb.append('#'); for(int idx = 0; idx < barr.length; idx++) { byte b = barr[idx]; sb.append(NIBBLE_TO_HEX[b&0x0f]); sb.append(NIBBLE_TO_HEX[(b&0xf0)>>4]); } return sb.toString(); } {code} |
332607 | No Perforce job exists for this issue. | 1 | 332936 | 4 years, 23 weeks, 1 day ago | 0|i1ldm7: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1715 | Upgrade netty version |
Improvement | Closed | Major | Fixed | Sean Bridges | Sean Bridges | Sean Bridges | 08/Jun/13 01:02 | 04/May/16 18:00 | 14/Dec/13 03:38 | 3.4.5 | 3.4.6, 3.5.0 | 2 | 5 | ZOOKEEPER-1838, ZOOKEEPER-1763, ZOOKEEPER-1681 | zookeeper 3.4.5 uses netty 3.2.2, which was released in August 2010. The latest version of netty is 3.6.6 released May 2013. Zookeeper should consider upgrading. | Upgrade netty version |
332170 | No Perforce job exists for this issue. | 4 | 332499 | 6 years, 2 weeks ago | 0|i1laxj: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1714 | perl client segfaults if ZOO_READ_ACL_UNSAFE constant is used |
Bug | Closed | Minor | Fixed | Botond Hejj | Botond Hejj | Botond Hejj | 06/Jun/13 09:06 | 13/Mar/14 14:17 | 21/Jun/13 16:01 | 3.4.5 | 3.4.6, 3.5.0 | contrib-bindings | 1 | 7 | if ZOO_READ_ACL_UNSAFE or ZOO_CREATOR_ALL_ACL constant is used than the client core dumps with segmentation fault. | 331648 | No Perforce job exists for this issue. | 2 | 331979 | 6 years, 2 weeks ago | 0|i1l7qn: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1713 | wrong time calculation in zkfuse.cc |
Bug | Closed | Trivial | Fixed | Germán Blanco | Germán Blanco | Germán Blanco | 06/Jun/13 05:51 | 13/Mar/14 14:16 | 02/Sep/13 16:23 | 3.4.5 | 3.4.6, 3.5.0 | 0 | 5 | Linux | A colleague of mine has spotted this error in time calculation in the code in zkfuse.cc lines 81 to 85: inline uint64_t nanosecsToMillisecs(uint64_t nanosecs) { return nanosecs * 1000000; } I am not sure how this method is used, but for sure it will make something wrong happen if it is. |
331624 | No Perforce job exists for this issue. | 1 | 331955 | 6 years, 2 weeks ago | 0|i1l7lb: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1712 | transient test failure in TestReconfig.cc |
Bug | Resolved | Major | Duplicate | Marshall McMullen | Camille Fournier | Camille Fournier | 04/Jun/13 13:00 | 16/Apr/16 09:04 | 04/Jun/13 13:24 | 0 | 3 | ZOOKEEPER-2152, ZOOKEEPER-1594 | zktest-mt | From the latest build logs: [exec] Zookeeper_watchers::testChildWatcher2 : elapsed 54 : OK [exec] /home/jenkins/jenkins-slave/workspace/ZooKeeper-trunk/trunk/src/c/tests/TestReconfig.cc:183: Assertion: equality assertion failed [Expected: 1, Actual : 0] [exec] Failures !!! [exec] Run: 67 Failure total: 1 Failures: 1 Errors: 0 [exec] FAIL: zktest-mt [exe |
331260 | No Perforce job exists for this issue. | 0 | 331593 | 6 years, 42 weeks, 2 days ago | 0|i1l5cv: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1711 | ZooKeeper server binds to all ip addresses for leader election and broadcast |
Bug | Closed | Minor | Duplicate | Unassigned | Germán Blanco | Germán Blanco | 31/May/13 07:16 | 13/Mar/14 14:17 | 29/Aug/13 09:22 | 3.4.5 | 3.4.6 | server | 1 | 3 | 259200 | 259200 | 0% | ZOOKEEPER-1096 | Any | Unlike current ZooKeeper version in trunk intended for release as 3.5.0, the current ZooKeeper server version 3.4.5 binds to all ip addresses on the specified port for election. It only makes sense to bind to the ip address indicated in the configuration file, which is where the other servers will connect. Listening to other ip addresses could have bad security implications. | 0% | 0% | 259200 | 259200 | 330638 | No Perforce job exists for this issue. | 0 | 330972 | 6 years, 2 weeks ago | 0|i1l1j3: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1710 | Leader should not use txnlog for synchronization if txnlog is corrupted or missing |
Improvement | Open | Minor | Unresolved | Unassigned | Thawan Kooburat | Thawan Kooburat | 17/May/13 21:45 | 31/May/13 18:52 | 3.5.0 | server | 0 | 3 | ZOOKEEPER-1413 | It is possible that a human error caused some txnlog files to be remove from the log dir. The leader should not use txnlog to synchronize with the learner if it found that there is a missing log or the file is corrupted. Since this can cause data inconsistency. |
328636 | No Perforce job exists for this issue. | 0 | 328979 | 6 years, 44 weeks, 5 days ago | 0|i1kp93: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1709 | Limit the size of txnlog file |
Improvement | Open | Minor | Unresolved | Thawan Kooburat | Thawan Kooburat | Thawan Kooburat | 17/May/13 21:38 | 17/May/13 21:40 | 3.5.0 | server | 0 | 2 | ZOOKEEPER-1413 | The server only create a new log file after ~100k txn. The size of txnlog file can be quite large (> 1GB) if request size is big. This will cause the server not to use txnlog to sync with the learner. So we added a parameter so that the server will create a new txnlog file whenever the size exceeded the limit. |
328635 | No Perforce job exists for this issue. | 0 | 328978 | 6 years, 44 weeks, 5 days ago | 0|i1kp8v: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1708 | Wrong version of java in control file for deb packages |
Bug | Resolved | Minor | Won't Fix | Johan Hillertz | Johan Hillertz | Johan Hillertz | 17/May/13 11:01 | 03/Mar/16 11:23 | 03/Mar/16 11:23 | 3.4.5 | 0 | 3 | ZOOKEEPER-1604 | After building the deb package it is not installable because of missing dependencies in the control file. Path: src/packages/deb/zookeeper.control/control If I remember correctly the package 'sun-java6-jre' is no longer provided by Ubuntu. If it is possible to run zookeeper in openjdk the correct string in the control file should be: "Depends: openjdk-6-jre" Or "Depends: openjdk-7-jre" |
328544 | No Perforce job exists for this issue. | 1 | 328888 | 4 years, 3 weeks ago | 0|i1koov: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1707 | Incorrect documentation of build dependencies for deb and rpm packages. |
Bug | Resolved | Minor | Won't Fix | Chris Nauroth | Johan Hillertz | Johan Hillertz | 17/May/13 10:16 | 03/Mar/16 11:23 | 03/Mar/16 11:23 | 3.4.5 | documentation | 0 | 3 | ZOOKEEPER-1893, ZOOKEEPER-1604 | Since I faild to build a deb package from the instructions. I found that the documentation in 'README_packaging.txt' for building Ubuntu packages can be improved. I have attached a suggested patch. Tested on Ubuntu 12.04 LTS |
documentation | 328540 | No Perforce job exists for this issue. | 2 | 328884 | 4 years, 3 weeks ago | 0|i1konz: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1706 | Typo in Double Barriers example |
Bug | Closed | Minor | Fixed | Jingguo Yao | Jingguo Yao | Jingguo Yao | 13/May/13 02:05 | 13/Mar/14 14:17 | 13/May/13 03:34 | 3.4.5 | 3.4.6, 3.5.0 | documentation | 14/May/13 | 0 | 4 | For the Double Barriers example in the "ZooKeeper Recipes and Solutions" page, the P should be L in line 4 of the Leave pseudo code. | 327608 | No Perforce job exists for this issue. | 1 | 327952 | 6 years, 2 weeks ago | 0|i1kiwv: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1705 | Certain implementations of C's rand() function coupled with the shuffle in libzookeeper_mt's getaddrs() produce a biased distribution of connections. |
Bug | Open | Minor | Unresolved | Unassigned | Stephen Tyree | Stephen Tyree | 10/May/13 14:56 | 10/May/13 14:56 | c client | 0 | 1 | Using libzookeeper_mt on an unsupported platform (OpenVMS) with a 5 server connection string, the fourth server in the connection string gets selected approximately only 6% of the time. This appears to be due to some strange properties of the LCG used in OpenVMS's C rand() function. Linux does not exhibit this behavior, but I can't speak for Windows, BSD, etc. It would be prudent, if libzookeeper_mt's behavior is intended to be the same on every platform it operates on (not that OpenVMS is one of those platforms), to use a PRNG of its own choosing. Integrating a defined PRNG, such as the mersenne twister, would give all platforms the same, correct behavior. |
327433 | No Perforce job exists for this issue. | 0 | 327777 | 6 years, 45 weeks, 6 days ago | 0|i1khtz: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1704 | Please add download link for tutorial |
New Feature | Open | Trivial | Unresolved | Unassigned | Hayden Schultz | Hayden Schultz | 09/May/13 17:04 | 09/May/13 17:04 | 3.4.5 | documentation | 0 | 1 | http://zookeeper.apache.org/doc/r3.2.2/zookeeperTutorial.html | There's no obvious way to download the source file other than copy/paste. | 327238 | No Perforce job exists for this issue. | 0 | 327582 | 6 years, 46 weeks ago | 0|i1kgmn: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1703 | Please add instructions for running the tutorial |
New Feature | Resolved | Minor | Fixed | Andor Molnar | Hayden Schultz | Hayden Schultz | 09/May/13 17:03 | 13/Oct/17 19:57 | 13/Oct/17 19:22 | 3.4.5 | 3.4.11, 3.5.4, 3.6.0 | documentation | 0 | 5 | tutorial http://zookeeper.apache.org/doc/r3.2.2/zookeeperTutorial.html | There's no instructions for running the tutorial. | newbie | 327237 | No Perforce job exists for this issue. | 0 | 327581 | 2 years, 22 weeks, 6 days ago | 0|i1kgmf: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1702 | ZooKeeper client may write operation packets before receiving successful response to connection request, can cause TCP RST |
Bug | Closed | Major | Fixed | Chris Nauroth | Chris Nauroth | Chris Nauroth | 09/May/13 16:24 | 13/Mar/14 14:17 | 01/Jul/13 19:24 | 3.4.2 | 3.4.6, 3.5.0 | java client | 0 | 10 | HADOOP-9555, HADOOP-9556 | The problem occurs when a connection attempt is pending and there are multiple outbound packets in the queue for other operations. In {{ClientCnxnSocketNIO#doIO}}, it is possible to receive notification that the socket is writable for the next operation packet before receiving notification that the socket is readable for the connection response from the server. If the server decides that the session is expired, then it responds by immediately closing the socket on its side. If the client has written packets after the server has closed its end of the socket, then the TCP stack may choose to abort the connection with an RST. When this happens, the client doesn't receive an orderly shutdown, and ultimately it fails to deliver a session expired event to the application. | 327217 | No Perforce job exists for this issue. | 1 | 327561 | 6 years, 2 weeks ago |
Reviewed
|
0|i1kghz: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1701 | When new and old config have the same version, no need to write new config to disk or create new connections |
Improvement | Resolved | Minor | Fixed | Alexander Shraer | Alexander Shraer | Alexander Shraer | 08/May/13 20:53 | 01/Apr/14 07:10 | 31/Mar/14 21:46 | 3.5.0 | 3.5.0 | server | 0 | 3 | setLastSeenQuorumVerifier in QuorumPeer.java always writes the new config to disk and tries to make new connections to servers in new config. When the new config has the same version as the committed one (e.g., when the config received in a NEWLEADER message is already known to the follower), there's no need to write it to disk or to create new connections. | 327090 | No Perforce job exists for this issue. | 2 | 327434 | 5 years, 51 weeks, 2 days ago | 0|i1kfpr: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1700 | FLETest consistently failing - setLastSeenQuorumVerifier seems to be hanging |
Bug | Resolved | Critical | Fixed | Patrick D. Hunt | Patrick D. Hunt | Patrick D. Hunt | 07/May/13 20:43 | 12/May/13 07:09 | 11/May/13 08:51 | 3.5.0 | 3.5.0 | quorum | 0 | 5 | I'm consistently seeing a failure on my laptop when running the FLETest "testJoin" test. What seems to be happening is that the call to setLastSeenQuorumVerifier is hanging. See the following log from the test, notice 17:35:57 for the period in question. Note that I turned on debug logging and added a few log messages around the call to setLastSeenQuorumVerifier (you can see the code enter but never leave) Note: I've applied ZOOKEEPER-1324 to trunk code and then run this test but that doesn't seem to help. Also note that this test is passing consistently when run against branch-3.4. {noformat} 2013-05-07 17:35:57,859 [myid:] - INFO [QuorumPeer[myid=0]/0:0:0:0:0:0:0:0:11221:Follower@65] - FOLLOWING - LEADER ELECTION TOOK - 16 2013-05-07 17:35:57,859 [myid:] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@436] - LEADING - LEADER ELECTION TOOK - 17 2013-05-07 17:35:57,863 [myid:] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:FileTxnSnapLog@270] - Snapshotting: 0x0 to /home/phunt/dev/zookeeper-trunk/build/test/tmp/test3690487600947307322.junit.dir/version-2/snapshot.0 2013-05-07 17:35:57,873 [myid:] - INFO [LearnerHandler-/127.0.0.1:34262:LearnerHandler@269] - Follower sid: 0 : info : 0.0.0.0:11222:11223:participant;0.0.0.0:11221 2013-05-07 17:35:57,878 [myid:] - INFO [LearnerHandler-/127.0.0.1:34262:LearnerHandler@328] - Synchronizing with Follower sid: 0 maxCommittedLog=0x0 minCommittedLog=0x0 peerLastZxid=0x0 2013-05-07 17:35:57,878 [myid:] - DEBUG [LearnerHandler-/127.0.0.1:34262:LearnerHandler@395] - committedLog is empty but leader and follower are in sync, zxid=0x0 2013-05-07 17:35:57,878 [myid:] - INFO [LearnerHandler-/127.0.0.1:34262:LearnerHandler@404] - Sending DIFF 2013-05-07 17:35:57,879 [myid:] - DEBUG [LearnerHandler-/127.0.0.1:34262:LearnerHandler@411] - Sending NEWLEADER message to 0 2013-05-07 17:35:57,880 [myid:] - INFO [QuorumPeer[myid=0]/0:0:0:0:0:0:0:0:11221:Learner@331] - Getting a diff from the leader 0x0 2013-05-07 17:35:57,885 [myid:] - INFO [QuorumPeer[myid=0]/0:0:0:0:0:0:0:0:11221:Learner@457] - Learner received NEWLEADER message 2013-05-07 17:35:57,885 [myid:] - INFO [QuorumPeer[myid=0]/0:0:0:0:0:0:0:0:11221:Learner@460] - NEWLEADER calling configfromstring 2013-05-07 17:35:57,885 [myid:] - INFO [QuorumPeer[myid=0]/0:0:0:0:0:0:0:0:11221:Learner@462] - NEWLEADER setting quorum verifier 2013-05-07 17:35:57,886 [myid:] - WARN [QuorumPeer[myid=0]/0:0:0:0:0:0:0:0:11221:QuorumPeer@1218] - setLastSeenQuorumVerifier called with stale config 0. Current version: 0 2013-05-07 17:36:01,880 [myid:] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@585] - Shutting down 2013-05-07 17:36:01,881 [myid:] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:Leader@591] - Shutdown called java.lang.Exception: shutdown Leader! reason: Waiting for a quorum of followers, only synced with sids: [ [1] ] at org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:591) at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:487) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:949) 2013-05-07 17:36:01,881 [myid:] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:ZooKeeperServer@398] - shutting down 2013-05-07 17:36:01,881 [myid:] - INFO [LearnerCnxAcceptor-0.0.0.0/0.0.0.0:11225:Leader$LearnerCnxAcceptor@398] - exception while shutting down acceptor: java.net.SocketException: Socket closed 2013-05-07 17:36:01,882 [myid:] - WARN [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:QuorumPeer@979] - PeerState set to LOOKING 2013-05-07 17:36:01,882 [myid:] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:QuorumPeer@863] - LOOKING 2013-05-07 17:36:01,883 [myid:] - DEBUG [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:11224:QuorumPeer@792] - Initializing leader election protocol... {noformat} |
326895 | No Perforce job exists for this issue. | 2 | 327240 | 6 years, 45 weeks, 4 days ago | 0|i1ke8n: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1699 | Leader should timeout and give up leadership when losing quorum of last proposed configuration |
Bug | Resolved | Blocker | Fixed | Alexander Shraer | Alexander Shraer | Alexander Shraer | 03/May/13 18:47 | 21/May/14 18:54 | 21/May/14 13:49 | 3.5.0 | 3.5.0 | server | 0 | 10 | A leader gives up leadership when losing a quorum of the current configuration. This doesn't take into account any proposed configuration. So, if a reconfig operation is in progress and a quorum of the new configuration is not responsive, the leader will just get stuck waiting for it to ACK the reconfig operation, and will never timeout. |
326416 | No Perforce job exists for this issue. | 9 | 326761 | 5 years, 44 weeks, 1 day ago | 0|i1kbaf: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1698 | Add deterministic host connection to Java client |
Improvement | Open | Minor | Unresolved | Unassigned | Owen Kim | Owen Kim | 01/May/13 21:04 | 26/Feb/14 17:45 | 2 | 4 | "C client has zoo_deterministic_conn_order() to make the connection order deterministic. We can add a similar feature to Java client." |
326094 | No Perforce job exists for this issue. | 0 | 326439 | 6 years, 4 weeks, 1 day ago | 0|i1k9av: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1697 | large snapshots can cause continuous quorum failure |
Bug | Closed | Critical | Fixed | Patrick D. Hunt | Patrick D. Hunt | Patrick D. Hunt | 30/Apr/13 20:38 | 13/Mar/14 14:17 | 11/May/13 09:35 | 3.4.3, 3.5.0 | 3.4.6, 3.5.0 | server | 0 | 12 | ZOOKEEPER-1324 | I keep seeing this on the leader: 2013-04-30 01:18:39,754 INFO org.apache.zookeeper.server.quorum.Leader: Shutdown called java.lang.Exception: shutdown Leader! reason: Only 0 followers, need 2 at org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:447) at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:422) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:753) The followers are downloading the snapshot when this happens, and are trying to do their first ACK to the leader, the ack fails with broken pipe. In this case the snapshots are large and the config has increased the initLimit. syncLimit is small - 10 or so with ticktime of 2000. Note this is 3.4.3 with ZOOKEEPER-1521 applied. I originally speculated that https://issues.apache.org/jira/browse/ZOOKEEPER-1521 might be related. I thought I might have broken something for this environment. That doesn't look to be the case. As it looks now it seems that 1521 didn't go far enough. The leader verifies that all followers have ACK'd to the leader within the last "syncLimit" time period. This runs all the time in the background on the leader to identify the case where a follower drops. In this case the followers take so long to load the snapshot that this check fails the very first time, as a result the leader drops (not enough ack'd followers w/in the sync limit) and re-election happens. This repeats forever. (the above error) this is the call: org.apache.zookeeper.server.quorum.LearnerHandler.synced() that's at odds. look at setting of tickOfLastAck in org.apache.zookeeper.server.quorum.LearnerHandler.run() It's not set until the follower first acks - in this case I can see that the followers are not getting to the ack prior to the leader shutting down due to the error log above. It seems that sync() should probably use the init limit until the first ack comes in from the follower. I also see that while tickOfLastAck and leader.self.tick is shared btw two threads there is no synchronization of the shared resources. |
325927 | No Perforce job exists for this issue. | 6 | 326272 | 6 years, 2 weeks ago | 0|i1k89r: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1696 | Fail to run zookeeper client on Weblogic application server |
Bug | Closed | Critical | Fixed | Jeffrey Zhong | Dmitry Konstantinov | Dmitry Konstantinov | 24/Apr/13 09:39 | 13/Mar/14 14:16 | 27/Sep/13 19:26 | 3.4.5 | 3.4.6, 3.5.0 | java client | 6 | 12 | ZOOKEEPER-1554 | Java version: jdk170_06 WebLogic Server Version: 10.3.6.0 |
The problem in details is described here: http://comments.gmane.org/gmane.comp.java.zookeeper.user/2897 The provided link also contains a reference to fix implementation. {noformat} ####<Apr 24, 2013 1:03:28 PM MSK> <Warning> <org.apache.zookeeper.ClientCnxn> <devapp090> <clust2> <[ACTIVE] ExecuteThread: '2' for queue: 'weblogic.kernel.Default (devapp090:2182)> <internal> <> <> <1366794208810> <BEA-000000> <WARN org.apache.zookeeper.ClientCnxn - Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect java.lang.IllegalArgumentException: No Configuration was registered that can handle the configuration named Client at com.bea.common.security.jdkutils.JAASConfiguration.getAppConfigurationEntry(JAASConfiguration.java:130) at org.apache.zookeeper.client.ZooKeeperSaslClient.<init>(ZooKeeperSaslClient.java:97) at org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:943) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:993) > {noformat} |
324722 | No Perforce job exists for this issue. | 3 | 325067 | 6 years, 2 weeks ago | 0|i1k0uf: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1695 | Inconsistent error code and type for new errors introduced by dynamic reconfiguration |
Bug | Resolved | Blocker | Fixed | Michi Mutsuzaki | Thawan Kooburat | Thawan Kooburat | 23/Apr/13 19:28 | 30/Apr/14 06:33 | 29/Apr/14 22:34 | 3.5.0 | 3.5.0 | server | 0 | 6 | From KeeperException.Code, RECONFIGINPROGRESS and NEWCONFIGNOQUORUM are declared as system errors. However, their error code suggested that they are API errors. We either need to move it to the right type or use the code from the right range |
324610 | No Perforce job exists for this issue. | 4 | 324955 | 5 years, 47 weeks, 1 day ago | 0|i1k05j: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1694 | ZooKeeper Leader sends a repeated NEWLEADER quorum packet to followers |
Bug | Resolved | Minor | Duplicate | Unassigned | Germán Blanco | Germán Blanco | 22/Apr/13 06:08 | 22/Apr/13 16:02 | 22/Apr/13 08:43 | 3.4.5, 3.5.0 | 3.5.0 | quorum | 0 | 3 | ZOOKEEPER-1324 | Windows, Linux, MacOSX | This is at least what it seems in the logs. This also seems to cause a second snapshot in the follower. | patch | 324273 | No Perforce job exists for this issue. | 1 | 324618 | 6 years, 48 weeks, 3 days ago | 0|i1jy33: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1693 | process may core or hang when xid is overflowed |
Bug | Resolved | Major | Duplicate | Unassigned | Jacky007 | Jacky007 | 19/Apr/13 08:14 | 09/Oct/13 02:38 | 09/Oct/13 02:38 | 3.4.5 | c client, java client | 0 | 1 | The xid will be confused with AUTHXID(-4) when it is overflowed. If the process send 4000 requests per second, it may core or hang after about ten days. |
323946 | No Perforce job exists for this issue. | 0 | 324291 | 6 years, 24 weeks, 1 day ago | 0|i1jw2f: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1692 | Add support for single member ensemble |
Improvement | Open | Minor | Unresolved | Thawan Kooburat | Thawan Kooburat | Thawan Kooburat | 15/Apr/13 18:49 | 08/Apr/17 05:34 | 3.4.0 | quorum | 0 | 5 | In the past, we ran into problem where quorum could not be formed multiple times. It take a while to investigate the root cause and fix the problem. Our current solution is to make it possible to run a quorum with a single member in it. Unlike standalone mode, it has to run as LeaderZooKeeper server, so that the observers can connect to it. This will allow the operator to use this workaround to bring back the ensemble quickly while investigating the problem in background. The main problem here is to allow the observers to connect with the leader when the quorum size is reduced to one. We don't want to update the (static) configuration on the observer since it require server restart. We are thinking of allowing the observer to connect to any participant which declared that it is the leader without running the leader election algorithm (because it won't have enough votes). |
323043 | No Perforce job exists for this issue. | 0 | 323388 | 2 years, 49 weeks, 5 days ago | 0|i1jqhr: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1691 | Add a flag to disable standalone mode |
Improvement | Resolved | Major | Fixed | Helen Hastings | Michi Mutsuzaki | Michi Mutsuzaki | 15/Apr/13 15:41 | 28/Jan/14 13:47 | 20/Jan/14 23:46 | 3.5.0 | quorum | 3 | 9 | ZOOKEEPER-1870, ZOOKEEPER-1783 | Currently you cannot use dynamic reconfiguration to bootstrap zookeeper cluster because the server goes into standalone mode when there is only one server in the cluster. --Michi |
323014 | No Perforce job exists for this issue. | 8 | 323359 | 6 years, 9 weeks, 1 day ago | 0|i1jqbb: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1690 | Race condition when close sock may cause a NPE in sendBuffer |
Bug | Open | Major | Unresolved | Unassigned | Jacky007 | Jacky007 | 15/Apr/13 07:53 | 15/Apr/13 07:59 | 3.4.6 | 0 | 2 | In NIOServerCnxn.java public void close() { closeSock(); ... sk.cancel(); Close sock first, then cancel the channel. public void sendBuffer(ByteBuffer bb) { if ((sk.interestOps() & SelectionKey.OP_WRITE) == 0) { ... sock.write(bb); Get ops of the channel, then read sock (may be null) I have noticed that the 3.5.0-branch has fixed the problem. |
322937 | No Perforce job exists for this issue. | 0 | 323282 | 6 years, 49 weeks, 3 days ago | 0|i1jpu7: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1689 | Remove JVMFLAGS completely from clients, if CLIENT_JVMFLAGS are also set |
Bug | Open | Minor | Unresolved | Unassigned | Jeff Lord | Jeff Lord | 12/Apr/13 11:30 | 06/Feb/17 16:24 | 3.4.5 | scripts | 0 | 2 | In zkCli.sh, the CLIENT_JVMFLAGS are being passed along with regular JVMFLAGS, so the latter ends up overriding it anyhow if set. Can we please remove JVMFLAGS completely from clients, if CLIENT_JVMFLAGS are also set (i.e. use just one). One example of how this can be detrimental is if you attempt to start a zookeeper-client session on the same host that is already running zookeeper and use the default config directory. If the zookeeper server has jmx enabled than the client will also pick up that port and attempt to bind resulting in a failure # /usr/bin/zookeeper-client Error: Exception thrown by the agent : java.rmi.server.ExportException: Port already in use: 9010; nested exception is: java.net.BindException: Address already in use |
322669 | No Perforce job exists for this issue. | 0 | 323014 | 3 years, 6 weeks, 3 days ago | 0|i1jo6v: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1688 | Transparent encryption of on-disk files |
New Feature | Open | Major | Unresolved | Unassigned | Andrew Kyle Purtell | Andrew Kyle Purtell | 10/Apr/13 15:28 | 07/May/14 10:35 | 3.5.0 | 0 | 9 | We propose to introduce optional transparent encryption of snapshots and commit logs on disk. The goal is to protect against the leakage of sensitive information from files at rest, due to accidental misconfiguration of filesystem permissions, improper decommissioning, or improper disk disposal. This change would introduce a new ServerConfig option that allows the administrator to select the desired persistence implementation by classname, and new persistence classes extending the File* classes that wrap current formats in encrypted containers. Otherwise and by default the current File* classes will be used without change. If enabled, transparent encryption of all on disk structures will be accomplished with a shared cluster key made available to the quorum peers via the Java Keystore (supporting various store options, including hardware security module integration). Small modifications to the LogFormatter and SnapshotFormatter utilities will be needed. A new utility for offline key rotation will also be provided. These changes will not introduce any new dependencies. The standard Java Cryptographic Extensions (JCE) are sufficient for implementation and can benefit from potential acceleration options provided by JCE now or future. |
322312 | No Perforce job exists for this issue. | 1 | 322657 | 5 years, 46 weeks, 1 day ago | 0|i1jlzj: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1687 | Number of past transactions retains in ZKDatabase.committedLog should be configurable |
Improvement | Open | Minor | Unresolved | Unassigned | Maho NAKATA | Maho NAKATA | 08/Apr/13 12:55 | 15/Apr/13 07:57 | 0 | 2 | ZKDatabase.committedLog retains the past 500 transactions. In case of memory usage is more important than speed and vice versa, this should be configurable. | 321814 | No Perforce job exists for this issue. | 0 | 322159 | 6 years, 49 weeks, 3 days ago | memory, transactions | 0|i1jiwv: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1686 | Publish ZK 3.4.5 test jar |
Bug | Resolved | Major | Fixed | Patrick D. Hunt | Todd Lipcon | Todd Lipcon | 05/Apr/13 18:17 | 03/Oct/13 19:54 | 29/Sep/13 23:58 | 3.4.5 | 3.4.4, 3.4.5 | build, tests | 0 | 6 | HADOOP-8315, ZOOKEEPER-1430 | ZooKeeper 3.4.2 used to publish a jar with the tests classifier for use by downstream project tests. It seems this didn't get published for 3.4.4 or 3.4.5 (see https://repository.apache.org/index.html#nexus-search;quick~org.apache.zookeeper). Would someone mind please publishing these artifacts? | 321556 | No Perforce job exists for this issue. | 0 | 321901 | 6 years, 25 weeks, 3 days ago |
Reviewed
|
0|i1jhbj: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1685 | Zookeeper client hard codes the server principal to zookeeper |
Bug | Open | Major | Unresolved | Unassigned | Arpit Gupta | Arpit Gupta | 05/Apr/13 17:03 | 30/May/19 11:38 | 3.4.5 | 0 | 3 | Noticed this while debugging a secure deploy. The server was started with the principal zk/_HOST When a client tried to connect to this it tried to setup a secure connection to server zookeeper/_HOST and failed authentication. In ClientCnxn.java {code} try { zooKeeperSaslClient = new ZooKeeperSaslClient("zookeeper/"+addr.getHostName()); } catch (LoginException e) { {code} |
321551 | No Perforce job exists for this issue. | 0 | 321896 | 42 weeks ago | 0|i1jhaf: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1684 | Failure to update socket addresses on immedate connection |
Bug | Open | Major | Unresolved | Unassigned | Shevek | Shevek | 05/Apr/13 16:43 | 05/Feb/20 07:15 | 3.7.0, 3.5.8 | 0 | 1 | ZOOKEEPER-1683, CURATOR-7 | I quote: void registerAndConnect(SocketChannel sock, InetSocketAddress addr) throws IOException { sockKey = sock.register(selector, SelectionKey.OP_CONNECT); boolean immediateConnect = sock.connect(addr); if (immediateConnect) { sendThread.primeConnection(); } } In the immediate case, there are several bugs: a) updateSocketAddresses() is never called, as it is when the select-loop in doTransport(). This means that clientCnxnSocket.getRemoteSocketAddress() will return null for the lifetime of this socket? b) CONNECT still in the interest set for the socket. c) updateLastSendAndHeard() is never called either. |
321550 | No Perforce job exists for this issue. | 1 | 321895 | 6 years, 49 weeks, 2 days ago | 0|i1jha7: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1683 | ZooKeeper client NPE when updating server list on disconnected client |
Bug | Resolved | Major | Fixed | Alexander Shraer | Shevek | Shevek | 04/Apr/13 18:20 | 18/Jul/14 07:35 | 17/Jul/14 16:58 | 3.5.0 | 3.5.0 | java client | 0 | 8 | CURATOR-15, ZOOKEEPER-1684, ZOOKEEPER-1355, CURATOR-7 | 2013-04-04 22:16:15,872 ERROR [pool-4-thread-1] com.netflix.curator.ConnectionState.getZooKeeper (ConnectionState.java:84) - Background exception caught java.lang.NullPointerException at org.apache.zookeeper.client.StaticHostProvider.updateServerList(StaticHostProvider.java:161) ~[zookeeper-3.5.0.jar:3.5.0--1] at org.apache.zookeeper.ZooKeeper.updateServerList(ZooKeeper.java:183) ~[zookeeper-3.5.0.jar:3.5.0--1] at com.netflix.curator.HandleHolder$1$1.setConnectionString(HandleHolder.java:121) ~[curator-client-1.3.5-SNAPSHOT.jar:?] The duff code is this: ClientCnxnSocket clientCnxnSocket = cnxn.sendThread.getClientCnxnSocket(); InetSocketAddress currentHost = (InetSocketAddress) clientCnxnSocket.getRemoteSocketAddress(); boolean reconfigMode = hostProvider.updateServerList(serverAddresses, currentHost); Now, currentHost might be null, if we're not yet connected. But StaticHostProvider.updateServerList indirects on it unconditionally. This would be caught by findbugs. |
321343 | No Perforce job exists for this issue. | 8 | 321688 | 5 years, 35 weeks, 6 days ago | 0|i1jg0n: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1682 | Method to request all zookeepers in cluster |
Improvement | Resolved | Minor | Duplicate | Unassigned | John Vines | John Vines | 02/Apr/13 10:57 | 02/Apr/13 14:01 | 02/Apr/13 14:01 | 0 | 2 | ZOOKEEPER-107 | I would like to see an API feature to request the list of all servers in the cluster. The idea here is that a client doesn't have to know about all servers to benefit from the distributed nature of Zookeeper. If they can connect to one, they can have their bases covered from there on out. | 320783 | No Perforce job exists for this issue. | 0 | 321124 | 6 years, 51 weeks, 2 days ago | 0|i1jcjb: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1681 | ZooKeeper 3.4.x can optionally use netty for nio but the pom does not declare the dep as optional |
Improvement | Patch Available | Major | Unresolved | Stevo Slavić | John Sirois | John Sirois | 02/Apr/13 09:05 | 05/Feb/20 07:11 | 3.4.0, 3.4.1, 3.4.2, 3.4.4, 3.4.5 | 3.7.0, 3.5.8 | 3 | 5 | ZOOKEEPER-823, ZOOKEEPER-1763, ZOOKEEPER-1715 | For example in [3.4.5|http://search.maven.org/remotecontent?filepath=org/apache/zookeeper/zookeeper/3.4.5/zookeeper-3.4.5.pom] we see: {code} $ curl -sS http://search.maven.org/remotecontent?filepath=org/apache/zookeeper/zookeeper/3.4.5/zookeeper-3.4.5.pom | grep -B1 -A4 org.jboss.netty <dependency> <groupId>org.jboss.netty</groupId> <artifactId>netty</artifactId> <version>3.2.2.Final</version> <scope>compile</scope> </dependency> {code} As a consumer I can depend on zookeeper with an exclude for org.jboss.netty#netty or I can let my transitive dep resolver pick a winner. This might be fine, except for those who might be using a more modern netty published under the newish io.netty groupId. With this twist you get both org.jboss.netty#netty;foo and io.netty#netty;bar on your classpath and runtime errors ensue from incompatibilities. unless you add an exclude against zookeeper (and clearly don't enable the zk netty nio handling.) I propose that this is a pom bug although this is debatable. Clearly as currently packaged zookeeper needs netty to compile, but I'd argue since it does not need netty to run, either the scope should be provided or optional or a zookeeper-netty lib should be broken out as an optional dependency and this new dep published by zookeeper can have a proper compile dependency on netty. |
320756 | No Perforce job exists for this issue. | 1 | 321097 | 1 year, 45 weeks ago | 0|i1jcdb: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1680 | Cannot connect with a given sessionId - it is discarded |
Bug | Open | Major | Unresolved | Unassigned | Shevek | Shevek | 29/Mar/13 19:39 | 02/Apr/13 17:33 | 0 | 1 | CURATOR-13, CURATOR-7 | While the API permits construction of a ZooKeeper client object with a given sessionId, the sessionId can never be used: ClientCnxn line 850: long sessId = (seenRwServerBefore) ? sessionId : 0; The only person who sets seenRwServerBefore is onConnected(). Therefore, it appears that passing a sessionId into a ZooKeeper constructor has no effect, as the ClientCnxn has never seen an RW server before, so it discards it anyway. |
320371 | No Perforce job exists for this issue. | 0 | 320712 | 6 years, 51 weeks, 6 days ago | 0|i1j9zr: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1679 | c client: use -Wdeclaration-after-statement |
Improvement | Resolved | Minor | Fixed | Michi Mutsuzaki | Michi Mutsuzaki | Michi Mutsuzaki | 29/Mar/13 15:34 | 21/Aug/13 07:06 | 21/Aug/13 05:53 | 3.4.5 | 3.5.0 | c client | 0 | 3 | Visual studio still doesn't support c99. --Michi |
320340 | No Perforce job exists for this issue. | 1 | 320681 | 6 years, 31 weeks, 1 day ago | 0|i1j9sv: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1678 | Server fails to join quorum when a peer is unreachable (5 ZK server setup) |
Bug | Open | Major | Unresolved | Unassigned | Julio Lopez | Julio Lopez | 28/Mar/13 20:19 | 09/Jul/13 05:43 | 3.4.5 | leaderElection | 1 | 7 | ZOOKEEPER-900 | java version "1.6.0_32" Java(TM) SE Runtime Environment (build 1.6.0_32-b05) Java HotSpot(TM) 64-Bit Server VM (build 20.7-b02, mixed mode) Distributor ID: Ubuntu Description: Ubuntu 12.04.1 LTS Release: 12.04 Codename: precise uname -a Linux ha-vani3-0 3.2.0-23-virtual #36-Ubuntu SMP Tue Apr 10 22:29:03 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux |
In a 5-node ZK cluster setup, in the following state: * 1 host is down / not reachable. * 4 hosts are up. * 3 ZK servers are in quorum. * a 4th ZK server was restarted and is trying to re-join the quorum. The 4th server is not able to rejoin the quorum because the connection to the host that is not established, and apparently takes to long to timeout. Stack traces and additional information coming. |
320189 | No Perforce job exists for this issue. | 0 | 320530 | 6 years, 37 weeks, 2 days ago | 0|i1j8vb: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1677 | Misuse of INET_ADDRSTRLEN |
Bug | Open | Major | Unresolved | Marshall McMullen | Shevek | Shevek | 28/Mar/13 17:00 | 28/Feb/20 16:56 | 3.5.0 | 3.7.0, 3.5.8 | 1 | 6 | ZOOKEEPER-3726 | ZOOKEEPER-1355. Add zk.updateServerList(newServerList) (Alex Shraer, Marshall McMullen via fpj) git-svn-id: https://svn.apache.org/repos/asf/zookeeper/trunk@1410731 13f79535-47bb-0310-9956-ffa450edef68 +int addrvec_contains(const addrvec_t *avec, const struct sockaddr_storage *addr) +{ + if (!avec || !addr) + { + return 0; + } + + int i = 0; + for (i = 0; i < avec->count; i++) + { + if(memcmp(&avec->data[i], addr, INET_ADDRSTRLEN) == 0) + return 1; + } + + return 0; +} Pretty sure that should be sizeof(sockaddr_storage). INET_ADDRSTRLEN is the size of the character buffer which needs to be allocated for the return value of inet_ntop, which seems to be totally wrong. |
320146 | No Perforce job exists for this issue. | 9 | 320487 | 1 year, 17 weeks ago | 0|i1j8lr: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1676 | C client zookeeper_interest returning ZOK on Connection Loss |
Bug | Closed | Blocker | Not A Problem | Yunong Xiao | Yunong Xiao | Yunong Xiao | 28/Mar/13 13:09 | 04/Sep/16 00:57 | 22/May/16 18:18 | 3.4.3 | c client | 1 | 8 | ZOOKEEPER-2432 | All | I have a fairly simple single-threaded C client set up -- single-threaded because we are embedding zk in the node.js/libuv runtime -- which consists of the following algorithm: zookeeper_interest(); select(); // perform zookeeper api calls zookeeper_process(); I've noticed that zookeeper_interest in the C client never returns error if it is unable to connect to the zk server. From the spec of the zookeeper_interest API, I see that zookeeper_interest is supposed to return ZCONNECTIONLOSS when disconnected from the client. However, digging into the code, I see that the client is making a non-blocking connect call https://github.com/apache/zookeeper/blob/trunk/src/c/src/zookeeper.c#L1596-1613 , and returning ZOK https://github.com/apache/zookeeper/blob/trunk/src/c/src/zookeeper.c#L1684 If we assume that the server is not up, this will mean that the subsequent select() call would return 0, since the fd is not ready, and future calls to zookeeper_interest will always return 0 and not the expected ZCONNECTIONLOSS. Thus an upstream client will never be aware that the connection is lost. I don't think this is the expected behavior. I have temporarily patched the zk C client such that zookeeper_interest will return ZCONNECTIONLOSS if it's still unable to connect after session_timeout has been exceeded. I have included a patch for the client which fixes this for release 3.4.3 6b35e96 in this branch: https://github.com/yunong/zookeeper/tree/release-3.4.3-patched Here's the patch https://gist.github.com/yunong/efe869a0345867d54adf For more information, please see this email thread. http://mail-archives.apache.org/mod_mbox/zookeeper-dev/201211.mbox/%3C11A8E7C3-4DDE-45D8-ABEC-A8A4D32CF647@gmail.com%3E |
320094 | No Perforce job exists for this issue. | 0 | 320435 | 3 years, 43 weeks, 4 days ago | Ok, let's resolve this one and document it. | 0|i1j8a7: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1675 | Make sync a quorum operation |
Bug | Open | Major | Unresolved | Michael Han | Alexander Shraer | Alexander Shraer | 26/Mar/13 16:49 | 31/Oct/19 06:44 | 3.4.0, 3.5.0 | 0 | 8 | ZOOKEEPER-2136, ZOOKEEPER-3600 | sync + read is supposed to return at least the latest write that completes before the sync starts. This is true if the leader doesn't change, but when it does it may not work. The problem happens when the old leader L1 still thinks that it is the leader but some other leader L2 was already elected and committed some operations. Suppose that follower F is connected to L1 and invokes a sync. Even though L1 responds to the sync, the recent operations committed by L2 will not be flushed to F so a subsequent read on F will not see these operations. To prevent this we should broadcast the sync like updates. This problem is also mentioned in Section 4.4 of the ZooKeeper peper (but the proposed solution there is insufficient to solve the issue). |
319651 | No Perforce job exists for this issue. | 0 | 319992 | 20 weeks, 2 days ago | 0|i1j5jr: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1674 | There is no need to clear & load the database across leader election |
Improvement | Open | Major | Unresolved | Unassigned | Jacky007 | Jacky007 | 21/Mar/13 10:25 | 26/Jan/17 17:23 | 0 | 4 | ZOOKEEPER-2678 | It is interesting to notice the piece of codes in QuorumPeer.java /* ZKDatabase is a top level member of quorumpeer * which will be used in all the zookeeperservers * instantiated later. Also, it is created once on * bootup and only thrown away in case of a truncate * message from the leader */ private ZKDatabase zkDb; It is introduced by ZOOKEEPER-596. Now, we just drop the database every leader election. We can keep it safely with ZOOKEEPER-1549. |
318719 | No Perforce job exists for this issue. | 0 | 319060 | 3 years, 50 weeks, 6 days ago | 0|i1izsn: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1673 | Zookeeper don't support cidr in expression in ACL with ip scheme |
Bug | Resolved | Minor | Fixed | Craig Condit | Lipin Dmitriy | Lipin Dmitriy | 19/Mar/13 06:01 | 26/Apr/14 07:04 | 25/Apr/14 17:41 | 3.4.5 | 3.5.0 | 1 | 8 | Currently, when i try to set ACL with cidr in expression, i get exception: {code} [zk: localhost:2181(CONNECTED) 2] setAcl /AS0 ip:127.0.0.1/8:cdrwa Exception in thread "main" org.apache.zookeeper.KeeperException$InvalidACLException: KeeperErrorCode = InvalidACL for /AS0 at org.apache.zookeeper.KeeperException.create(KeeperException.java:112) at org.apache.zookeeper.KeeperException.create(KeeperException.java:42) at org.apache.zookeeper.ZooKeeper.setACL(ZooKeeper.java:1175) at org.apache.zookeeper.ZooKeeperMain.processZKCmd(ZooKeeperMain.java:716) at org.apache.zookeeper.ZooKeeperMain.processCmd(ZooKeeperMain.java:581) at org.apache.zookeeper.ZooKeeperMain.executeLine(ZooKeeperMain.java:353) at org.apache.zookeeper.ZooKeeperMain.run(ZooKeeperMain.java:311) at org.apache.zookeeper.ZooKeeperMain.main(ZooKeeperMain.java:270) {code} Also, there is no support for CIDR in IPAuthenticationProvider.isValid, but IPAuthenticationProvider.matches has it. |
auth | 318211 | No Perforce job exists for this issue. | 3 | 318552 | 5 years, 47 weeks, 5 days ago | 0|i1iwnr: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1672 | zookeeper client does not accept "-members" option in reconfig command |
Bug | Resolved | Trivial | Fixed | Xiaoshuang Wang | Xiaoshuang Wang | Xiaoshuang Wang | 18/Mar/13 20:40 | 20/Mar/13 07:30 | 20/Mar/13 02:21 | 3.5.0 | 3.5.0 | java client | 0 | 5 | 0 | 0 | 0% | Zookeeper trunk | Without the modification to src/java/main/org/apache/zookeeper/cli/ReconfigCommand.java line 88, the reconfig command will not accept "-member" options by complaining not using the right usage. | 0% | 0% | 0 | 0 | patch | 318169 | No Perforce job exists for this issue. | 1 | 318510 | 7 years, 1 week, 1 day ago |
Reviewed
|
0|i1iwef: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1671 | Remove dependency on log4j 1.2.15 |
Bug | Open | Minor | Unresolved | Unassigned | Alex Blewitt | Alex Blewitt | 18/Mar/13 09:22 | 18/Mar/13 09:22 | 0 | 1 | The zookeeper dependency 3.4.5 (latest) depends explicitly on log4j 1.2.15, which has dependencies on com.sun.jmx which can't be resolved from Maven central. Please change the dependency to either 1.2.16, which declares these as optional, or 1.2.14, which doesn't have them at all. http://search.maven.org/remotecontent?filepath=org/apache/zookeeper/zookeeper/3.4.5/zookeeper-3.4.5.pom <dependency> <groupId>log4j</groupId> <artifactId>log4j</artifactId> <version>1.2.15</version> <scope>compile</scope> </dependency> This should be modified to 1.2.14 or 1.2.16 as above. It's also not clear why this is used at all; it would be better for ZooKeeper to depend only on slf4j-api, and let users determine what the right slf4j logging implementation is. With this approach, it's not possible to swap out log4j for something else. |
318023 | No Perforce job exists for this issue. | 0 | 318364 | 7 years, 1 week, 3 days ago | 0|i1ivhz: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1670 | zookeeper should set a default value for SERVER_JVMFLAGS and CLIENT_JVMFLAGS so that memory usage is controlled |
Bug | Resolved | Major | Fixed | Flavio Paiva Junqueira | Arpit Gupta | Arpit Gupta | 15/Mar/13 17:36 | 18/Dec/19 08:34 | 01/Oct/13 17:32 | 3.4.5 | 3.5.0 | 0 | 7 | We noticed this with jdk 1.6 where if no heap size is set the process takes up to 1/4 of mem available on the machine. More info http://stackoverflow.com/questions/3428251/is-there-a-default-xmx-setting-for-java-1-5 You can run the following command to see what are the defaults for your machine {code} java -XX:+PrintFlagsFinal -version 2>&1 | grep -i -E 'heapsize|permsize|version' {code} And we noticed on two different class of machines that this was 1/4th of total memory on the machine. |
317766 | No Perforce job exists for this issue. | 5 | 318107 | 13 weeks, 1 day ago | 0|i1itwv: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1669 | Operations to server will be timed-out while thousands of sessions expired same time |
Improvement | Resolved | Major | Fixed | Cheney Sun | tokoot | tokoot | 15/Mar/13 03:32 | 03/Aug/17 21:47 | 03/Aug/17 12:23 | 3.3.5 | 3.4.11 | server | 0 | 11 | If there are thousands of clients, and most of them disconnect with server same time(client restarted or servers partitioned with clients), the server will busy to close those "connections" and become unavailable. The problem is in following: private void closeSessionWithoutWakeup(long sessionId) { HashSet<NIOServerCnxn> cnxns; synchronized (this.cnxns) { cnxns = (HashSet<NIOServerCnxn>)this.cnxns.clone(); // other thread will block because of here } ... } A real world example that demonstrated this problem (Kudos to [~sun.cheney]): {noformat} The issue is raised while tens thousands of clients try to reconnect ZooKeeper service. Actually, we came across the issue during maintaining our HBase cluster, which used a 5-server ZooKeeper cluster. The HBase cluster was composed of many many regionservers (in thousand order of magnitude), and connected by tens thousands of clients to do massive reads/writes. Because the r/w throughput is very high, ZooKeeper zxid increased quickly as well. Basically, each two or three weeks, Zookeeper would make leader relection triggered by the zxid roll over. The leader relection will cause the clients(HBase regionservers and HBase clients) disconnected and reconnected with Zookeeper servers in the mean time, and try to renew the sessions. In current implementation of session renew, NIOServerCnxnFactory will clone all the connections at first in order to avoid race condition in multi-threads and go iterate the cloned connection set one by one to find the related session to renew. It's very time consuming. In our case (described above), it caused many region servers can't successfully renew session before session timeout, and eventually the HBase cluster lose these region servers and affect the HBase stability. The change is to make refactoring to the close session logic and introduce a ConcurrentHashMap to store session id and connection map relation, which is a thread-safe data structure and eliminate the necessary to clone the connection set at first. {noformat} |
performance | 317643 | No Perforce job exists for this issue. | 0 | 317984 | 2 years, 32 weeks, 6 days ago | 0|i1it5j: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1668 | “Memory leak” about permgen |
Improvement | Resolved | Major | Not A Problem | Unassigned | tokoot | tokoot | 15/Mar/13 03:24 | 10/Oct/13 13:46 | 10/Oct/13 13:46 | 3.3.5 | jmx, server | 0 | 2 | For each connection, a ConnectionBean will be created to represent this connection at finishSessionInit: | ... | jmxConnectionBean = new ConnectionBean(this, zk); | MBeanRegistry.getInstance().register(jmxConnectionBean, zk.jmxServerBean); || ... || ObjectName oname = makeObjectName(path, bean); ||| ... ||| return new ObjectName(beanName.toString()); |||| ... |||| _canonicalName = (new String(canonical_chars, 0, prop_index)).intern(); So, for every connection, it takes dozens of bytes at permgen. With connection established constantly, the usage of permgen will increase continuously. Is it reasonable or necessary to manage each connection with ConnectionBean? |
317641 | No Perforce job exists for this issue. | 0 | 317982 | 6 years, 24 weeks, 1 day ago | 0|i1it53: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1667 | Watch event isn't handled correctly when a client reestablish to a server |
Bug | Closed | Blocker | Fixed | Flavio Paiva Junqueira | Jacky007 | Jacky007 | 14/Mar/13 04:56 | 12/May/15 01:37 | 22/Oct/13 06:56 | 3.3.6, 3.4.5 | 3.4.6, 3.5.0 | server | 1 | 14 | ZOOKEEPER-2182 | When a client reestablish to a server, it will send the watches which have not been triggered. But the code in DataTree does not handle it correctly. It is obvious, we just do not notice it :) scenario: 1) Client a set a data watch on /d, then disconnect, client b delete /d and create it again. When client a reestablish to zk, it will receive a NodeCreated rather than a NodeDataChanged. 2) Client a set a exists watch on /e(not exist), then disconnect, client b create /e. When client a reestablish to zk, it will receive a NodeDataChanged rather than a NodeCreated. |
317480 | No Perforce job exists for this issue. | 5 | 317821 | 4 years, 45 weeks, 2 days ago | 0|i1is5b: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1666 | Avoid Reverse DNS lookup if the hostname in connection string is literal IP address. |
Improvement | Closed | Major | Fixed | George Cao | George Cao | George Cao | 14/Mar/13 00:58 | 30/Jan/17 07:08 | 13/Nov/13 09:18 | 3.4.5 | 3.4.6, 3.5.0 | java client | 0 | 12 | ZOOKEEPER-1652, ZOOKEEPER-2184, OOZIE-1959 | In our ENV, if the InetSocketAddress.getHostName() is called and the host name in the connection string are literal IP address, then the call will trigger a reverse DNS lookup which is very slow. And in this situation, the host name can simply set as the IP without causing any problem. |
patch, test | 317447 | No Perforce job exists for this issue. | 4 | 317788 | 6 years, 2 weeks ago | Try to avoid reverse name service look up when the connection string consists of literal IP addresses but not real host names. | 0|i1irxz: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1665 | Support recursive deletion in multi |
New Feature | Resolved | Major | Won't Fix | Unassigned | Ted Yu | Ted Yu | 14/Mar/13 00:08 | 02/Apr/14 14:38 | 02/Apr/14 14:38 | 0 | 5 | HBASE-7847 | Use case in HBase is that we need to recursively delete multiple subtrees: {code} ZKUtil.deleteChildrenRecursively(watcher, acquiredZnode); ZKUtil.deleteChildrenRecursively(watcher, reachedZnode); ZKUtil.deleteChildrenRecursively(watcher, abortZnode); {code} To achieve high consistency, it is desirable to use multi for the above operations. This JIRA adds support for recursive deletion in multi. |
317442 | No Perforce job exists for this issue. | 0 | 317783 | 5 years, 51 weeks, 1 day ago | 0|i1irwv: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1664 | Kerberos auth doesn't work with native platform GSS integration |
Bug | Resolved | Major | Fixed | Unassigned | Boaz Kelmer | Boaz Kelmer | 13/Mar/13 14:58 | 11/Sep/13 18:00 | 11/Sep/13 18:00 | 3.4.5, 3.5.0 | java client, server | 0 | 6 | Linux (and likely also Solaris). | Java on Linux/Solaris can be set up to use the native (via C library) GSS implementation. This is configured by setting the system property sun.security.jgss.native=true When using this feature, ZooKeeper Sasl/JGSS authentication doesn't work. The reason is explained in http://docs.oracle.com/javase/6/docs/technotes/guides/security/jgss/jgss-features.html """ [when using native GSS...] In addition, when performing operations as a particular Subject, e.g. Subject.doAs(...) or Subject.doAsPrivileged(...), the to-be-used GSSCredential should be added to Subject's private credential set. Otherwise, the GSS operations will fail since no credential is found. """ |
317331 | No Perforce job exists for this issue. | 3 | 317672 | 6 years, 28 weeks, 1 day ago | 0|i1ir87: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1663 | scripts don't work when path contains spaces |
Bug | Closed | Minor | Fixed | Amichai Rothman | Amichai Rothman | Amichai Rothman | 12/Mar/13 11:25 | 26/Jun/14 07:19 | 20/May/13 13:12 | 3.4.5 | 3.4.6, 3.5.0 | scripts | 0 | 6 | Kubuntu 12.10 (GNU bash 4.2.37) | The shell scripts (bin/zk*.sh) don't work when there are spaces in the zookeeper or java paths. | 317082 | No Perforce job exists for this issue. | 4 | 317423 | 5 years, 39 weeks ago | 0|i1ipov: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1662 | Fix to two small bugs in ReconfigTest.testPortChange() |
Bug | Resolved | Minor | Fixed | Alexander Shraer | Alexander Shraer | Alexander Shraer | 08/Mar/13 19:27 | 11/Mar/14 07:11 | 10/Mar/14 21:45 | 3.5.0 | 3.5.0 | tests | 0 | 4 | Fix to two small bugs in ReconfigTest.testPortChange(): 1. the test expected a port change to happen immediately, which is not necessarily going to happen. The fix waits a bit and also tries several times. 2. when a client port changes, the test created a new ZooKeeper handle, but didn't specify a Watcher object, which generated some NullPointerException events when the watcher was triggered. |
316640 | No Perforce job exists for this issue. | 1 | 316982 | 6 years, 2 weeks, 2 days ago | 0|i1imz3: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1661 | Random (?) 5s delay when establishing connection |
Bug | Open | Major | Unresolved | Unassigned | Yan Pujante | Yan Pujante | 07/Mar/13 16:11 | 12/Mar/13 06:41 | 3.4.5 | server | 1 | 6 | ZOOKEEPER-1476, ZOOKEEPER-1652 | I have a client connecting to ZooKeeper and I am sometimes seeing a 5s delay before the opening of the socket connection: Here is the output on the client side: {noformat} 2013/03/07 10:53:48.729 INFO [org.apache.zookeeper.ZooKeeper] Client environment:zookeeper.version=3.4.5-1392090, built on 09/30/2012 17:52 GMT 2013/03/07 10:53:48.729 INFO [org.apache.zookeeper.ZooKeeper] Client environment:host.name=xeon 2013/03/07 10:53:48.729 INFO [org.apache.zookeeper.ZooKeeper] Client environment:java.version=1.6.0_41 2013/03/07 10:53:48.729 INFO [org.apache.zookeeper.ZooKeeper] Client environment:java.vendor=Apple Inc. 2013/03/07 10:53:48.729 INFO [org.apache.zookeeper.ZooKeeper] Client environment:java.home=/System/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Home 2013/03/07 10:53:48.729 INFO [org.apache.zookeeper.ZooKeeper] Client environment:java.class.path=/local/java/lib/tools.jar:lib/ant-1.8.4.jar:lib/ant-antlr-1.8.4.jar:lib/ant-junit-1.8.4.jar:lib/ant-launcher-1.8.4.jar:lib/antlr-2.7.7.jar:lib/asm-4.0.jar:lib/asm-analysis-4.0.jar:lib/asm-commons-4.0.jar:lib/asm-tree-4.0.jar:lib/asm-util-4.0.jar:lib/commons-cli-1.2.jar:lib/groovy-2.0.7.jar:lib/groovy-ant-2.0.7.jar:lib/groovy-groovydoc-2.0.7.jar:lib/groovy-templates-2.0.7.jar:lib/groovy-xml-2.0.7.jar:lib/jackson-annotations-2.1.4.jar:lib/jackson-core-2.1.4.jar:lib/jackson-databind-2.1.4.jar:lib/jline-0.9.94.jar:lib/json-20090211.jar:lib/jul-to-slf4j-1.6.2.jar:lib/junit-3.8.1.jar:lib/log4j-1.2.16.jar:lib/netty-3.2.2.Final.jar:lib/org.linkedin.util-core-1.8.glu47.0.jar:lib/org.linkedin.util-groovy-1.8.glu47.0.jar:lib/org.linkedin.zookeeper-cli-impl-1.5.glu47.0-SNAPSHOT.jar:lib/org.linkedin.zookeeper-impl-1.5.glu47.0-SNAPSHOT.jar:lib/slf4j-api-1.6.2.jar:lib/slf4j-log4j12-1.6.2.jar:lib/zookeeper-3.4.5.jar 2013/03/07 10:53:48.730 INFO [org.apache.zookeeper.ZooKeeper] Client environment:java.library.path=/local/instantclient10:.:/Users/ypujante/Library/Java/Extensions:/Library/Java/Extensions:/System/Library/Java/Extensions:/usr/lib/java 2013/03/07 10:53:48.730 INFO [org.apache.zookeeper.ZooKeeper] Client environment:java.io.tmpdir=/var/folders/dj/qmkmx5648xjf2n006s7hc1v80000gq/T/ 2013/03/07 10:53:48.730 INFO [org.apache.zookeeper.ZooKeeper] Client environment:java.compiler=<NA> 2013/03/07 10:53:48.730 INFO [org.apache.zookeeper.ZooKeeper] Client environment:os.name=Mac OS X 2013/03/07 10:53:48.730 INFO [org.apache.zookeeper.ZooKeeper] Client environment:os.arch=x86_64 2013/03/07 10:53:48.730 INFO [org.apache.zookeeper.ZooKeeper] Client environment:os.version=10.8.2 2013/03/07 10:53:48.730 INFO [org.apache.zookeeper.ZooKeeper] Client environment:user.name=ypujante 2013/03/07 10:53:48.730 INFO [org.apache.zookeeper.ZooKeeper] Client environment:user.home=/Users/ypujante 2013/03/07 10:53:48.730 INFO [org.apache.zookeeper.ZooKeeper] Client environment:user.dir=/export/content/linkedin-zookeeper/org.linkedin.zookeeper-cli-1.5.glu47.0-SNAPSHOT 2013/03/07 10:53:48.731 INFO [org.apache.zookeeper.ZooKeeper] Initiating client connection, connectString=localhost:2181 sessionTimeout=100 watcher=org.linkedin.zookeeper.client.ZKClient@3823bdd1 2013/03/07 10:53:48.737 DEBUG [org.apache.zookeeper.ClientCnxn] zookeeper.disableAutoWatchReset is false 2013/03/07 10:53:48.756 DEBUG [org.linkedin.zookeeper.cli.ClientMain] Talking to zookeeper on localhost:2181 2013/03/07 10:53:53.763 INFO [org.apache.zookeeper.ClientCnxn] Opening socket connection to server fe80:0:0:0:0:0:0:1%1/fe80:0:0:0:0:0:0:1%1:2181. Will not attempt to authenticate using SASL (Unable to locate a login configuration) {noformat} From this output you can see the line at 10:53:48 => Initiating client connection And then 5s later, at 10:53:53 => opening socket connection Note that I did not see this delay/problem prior to upgrading to 3.4.5 (from 3.3.3) Also note that sometimes there is no delay as in the following output! {noformat} 2013/03/07 11:04:06.084 INFO [org.apache.zookeeper.ZooKeeper] Initiating client connection, connectString=localhost:2181 sessionTimeout=100 watcher=org.linkedin.zookeeper.client.ZKClient@1e670479 2013/03/07 11:04:06.089 DEBUG [org.apache.zookeeper.ClientCnxn] zookeeper.disableAutoWatchReset is false 2013/03/07 11:04:06.109 DEBUG [org.linkedin.zookeeper.cli.ClientMain] Talking to zookeeper on localhost:2181 2013/03/07 11:04:06.116 INFO [org.apache.zookeeper.ClientCnxn] Opening socket connection to server localhost/0:0:0:0:0:0:0:1:2181. Will not attempt to authenticate using SASL (Unable to locate a login configuration) {noformat} I will be more than happy to provide more details if necessary. The client code is open source and hosted on github @ https://github.com/linkedin/linkedin-zookeeper/blob/master/org.linkedin.zookeeper-cli-impl/src/main/groovy/org/linkedin/zookeeper/cli/ClientMain.groovy#L65 and is not doing mech but (under the cover) new ZooKeeper("localhost:2181", 100, watcher) and then wait until the SyncConnected even is received... Thanks Yan |
316385 | No Perforce job exists for this issue. | 0 | 316728 | 7 years, 2 weeks, 2 days ago | 0|i1ilen: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1660 | ZOOKEEPER-1987 Add documentation for dynamic reconfiguration |
Sub-task | Resolved | Blocker | Fixed | Reed Wanderman-Milne | Alexander Shraer | Alexander Shraer | 07/Mar/13 02:07 | 10/Nov/14 18:47 | 29/Aug/14 10:39 | 3.5.0 | 3.5.1, 3.6.0 | documentation | 0 | 10 | ZOOKEEPER-1411, ZOOKEEPER-1355, ZOOKEEPER-107, ZOOKEEPER-1727 | Update user manual with reconfiguration info. | 316233 | No Perforce job exists for this issue. | 3 | 316576 | 5 years, 19 weeks, 3 days ago |
Reviewed
|
0|i1ikgv: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1659 | Add JMX support for dynamic reconfiguration |
Bug | Resolved | Blocker | Fixed | Rakesh Radhakrishnan | Alexander Shraer | Alexander Shraer | 07/Mar/13 01:50 | 04/Jun/14 16:50 | 04/Jun/14 16:08 | 3.5.0 | 3.5.0 | server | 0 | 10 | ZOOKEEPER-107 | We need to update JMX during reconfigurations. Currently, reconfiguration changes are not reflected in JConsole. | 316231 | No Perforce job exists for this issue. | 8 | 316574 | 5 years, 42 weeks, 1 day ago | 0|i1ikgf: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1658 | Support SRV records |
Improvement | Open | Minor | Unresolved | Unassigned | Devin Bayer | Devin Bayer | 03/Mar/13 07:12 | 03/Mar/13 07:12 | 2 | 3 | We want to make client configuration easy so the quorum is just a single A rrset with multiple IP addresses. This isn't ideal because we need to hard-code the IPs off our zookeeper servers and they already have domain names. If zookeeper supported SRV, we could just do: _zookeeper.example.com. 86400 IN SRV 10 60 2181 worker1 _zookeeper.example.com. 86400 IN SRV 10 20 2181 worker2 _zookeeper.example.com. 86400 IN SRV 10 10 2181 worker3 and -Dhbase.zookeeper.quorum=example.com |
dns, zookeeper | 315513 | No Perforce job exists for this issue. | 0 | 315857 | 7 years, 3 weeks, 4 days ago | 0|i1ig1b: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1657 | Increased CPU usage by unnecessary SASL checks |
Bug | Closed | Major | Fixed | Philip K. Warren | Gunnar Wagenknecht | Gunnar Wagenknecht | 01/Mar/13 11:44 | 13/Mar/14 14:16 | 18/Sep/13 06:58 | 3.4.5 | 3.4.6, 3.5.0 | java client | 1 | 10 | ZOOKEEPER-1512, ZOOKEEPER-1623, ZOOKEEPER-1764 | I did some profiling in one of our Java environments and found an interesting footprint in ZooKeeper. The SASL support seems to trigger a lot times on the client although it's not even in use. Is there a switch to disable SASL completely? The attached screenshot shows a 10-minute profiling session on one of our production Jetty servers. The Jetty server handles ~1k web requests per minute. The average response time per web request is a few milli seconds. The profiling was performed on a machine running for >24h. We noticed a significant CPU increase on our servers when deploying an update from ZooKeeper 3.3.2 to ZooKeeper 3.4.5. Thus, we started investigating. The screenshot shows that only 32% CPU time are spent in Jetty. In contrast, 65% are spend in ZooKeeper. A few notes/thoughts: * {{ClientCnxn$SendThread.clientTunneledAuthenticationInProgress}} seems to be the culprit * {{javax.security.auth.login.Configuration.getConfiguration}} seems to be called very often? * There is quite a bit reflection involved in {{java.security.AccessController.doPrivileged}} * No security manager is active in the JVM: I tend to place an if-check in the code before calling {{AccessController.doPrivileged}}. When no SM is installed, the runnable can be called directly which safes cycles. |
performance | 315344 | No Perforce job exists for this issue. | 9 | 315688 | 6 years, 2 weeks ago | 0|i1iezr: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1656 | OSGI - Missing import package - ClassNotFoundException |
Bug | Open | Major | Unresolved | Unassigned | Florian Pirchner | Florian Pirchner | 01/Mar/13 11:20 | 09/Oct/13 02:33 | 3.4.5 | 1 | 4 | OSGi | "Import package" are missing for bundle org.apache.hadoop.zookeeper. I am getting an exception running the Zookeeper server in an OSGi environment. ZookeeperServerMain uses import org.slf4j.Logger; import org.slf4j.LoggerFactory; But there is no import in MANIFEST.mf: Import-Package: javax.management,org.apache.log4j,org.osgi.framework;v ersion="[1.4,2.0)",org.osgi.util.tracker;version="[1.1,2.0)" I am sure that another missing package would be the subpackage of org.apache.log4j like org.apache.log4j.jmx. Best, Florian |
315339 | No Perforce job exists for this issue. | 0 | 315683 | 6 years, 24 weeks, 1 day ago | 0|i1ieyn: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1655 | Make jline dependency optional in maven pom |
Bug | Resolved | Major | Fixed | Thomas Weise | Thomas Weise | Thomas Weise | 28/Feb/13 00:07 | 11/Oct/16 14:15 | 01/Oct/13 17:42 | 3.4.2 | 3.5.0 | build | 0 | 7 | HADOOP-9342, YARN-2815, ZOOKEEPER-1249, ZOOKEEPER-1718 | Old JLine version used in ZK CLI should not be pulled into downstream projects. |
315051 | No Perforce job exists for this issue. | 2 | 315395 | 3 years, 23 weeks, 2 days ago |
Reviewed
|
0|i1id6n: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1654 | bad documentation link on site |
Bug | Open | Minor | Unresolved | Michael Han | Camille Fournier | Camille Fournier | 27/Feb/13 16:00 | 06/Sep/17 09:44 | 3.4.5 | 0 | 3 | If you go to this page: http://zookeeper.apache.org/doc/trunk/zookeeperProgrammers.html Then try to click on Developer -> API Docs you'll get to http://zookeeper.apache.org/doc/trunk/api/index.html Which does not exist. Should point to: http://zookeeper.apache.org/doc/current/api/index.html |
314984 | No Perforce job exists for this issue. | 0 | 315328 | 2 years, 28 weeks, 1 day ago | 0|i1icrr: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1653 | zookeeper fails to start because of inconsistent epoch |
Bug | Closed | Blocker | Fixed | Michi Mutsuzaki | Michi Mutsuzaki | Michi Mutsuzaki | 26/Feb/13 20:59 | 12/Jan/16 10:25 | 26/Nov/13 18:44 | 3.4.5 | 3.4.6 | quorum | 1 | 11 | ZOOKEEPER-2162 | It looks like QuorumPeer.loadDataBase() could fail if the server was restarted after zk.takeSnapshot() but before finishing self.setCurrentEpoch(newEpoch) in Learner.java. {code:java} case Leader.NEWLEADER: // it will be NEWLEADER in v1.0 zk.takeSnapshot(); self.setCurrentEpoch(newEpoch); // <<< got restarted here snapshotTaken = true; writePacket(new QuorumPacket(Leader.ACK, newLeaderZxid, null, null), true); break; {code} The server fails to start because currentEpoch is still 1 but the last processed zkid from the snapshot has been updated. {noformat} 2013-02-20 13:45:02,733 5543 [pool-1-thread-1] ERROR org.apache.zookeeper.server.quorum.QuorumPeer - Unable to load database on disk java.io.IOException: The current epoch, 1, is older than the last zxid, 8589934592 at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:439) at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:413) ... {noformat} {noformat} $ find datadir datadir datadir/version-2 datadir/version-2/currentEpoch.tmp datadir/version-2/acceptedEpoch datadir/version-2/snapshot.0 datadir/version-2/currentEpoch datadir/version-2/snapshot.200000000 $ cat datadir/version-2/currentEpoch.tmp 2% $ cat datadir/version-2/acceptedEpoch 2% $ cat datadir/version-2/currentEpoch 1% {noformat} |
314830 | No Perforce job exists for this issue. | 5 | 315174 | 4 years, 10 weeks, 2 days ago | ZOOKEEPER-1549.patch should fix this issue in 3.5 branch. | 0|i1ibtj: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1652 | zookeeper java client does a reverse dns lookup when connecting |
Bug | Resolved | Critical | Duplicate | Sean Bridges | Sean Bridges | Sean Bridges | 26/Feb/13 13:45 | 26/Oct/16 10:25 | 04/Nov/13 21:34 | 3.4.5 | java client | 1 | 10 | ZOOKEEPER-1661, ZOOKEEPER-1666 | When connecting to zookeeper, the client does a reverse dns lookup on the hostname. In our environment, the reverse dns lookup takes 5 seconds to fail, causing zookeeper clients to connect slowly. The reverse dns lookup occurs in ClientCnx in the calls to adr.getHostName() {code} setName(getName().replaceAll("\\(.*\\)", "(" + addr.getHostName() + ":" + addr.getPort() + ")")); try { zooKeeperSaslClient = new ZooKeeperSaslClient("zookeeper/"+addr.getHostName()); } catch (LoginException e) { {code} |
314711 | No Perforce job exists for this issue. | 1 | 315055 | 3 years, 21 weeks, 1 day ago | 0|i1ib33: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1651 | Add support for compressed snapshot |
Improvement | Resolved | Major | Fixed | Brian Nixon | Thawan Kooburat | Thawan Kooburat | 25/Feb/13 18:56 | 02/May/19 04:47 | 02/May/19 04:46 | 3.6.0 | server | 0 | 7 | ZOOKEEPER-3179 | We want to keep many copies of snapshots on disk so that we can debug the problem afterward. However, the snapshot can be large, so we added a feature that allow the server to dump/load snapshot in a compressed format (snappy or gzip). This also benefit db loading and snapshotting time as well. This is also depends on client workload. In one of our deployment where clients don't compress its data, we found that snappy compression work best. The snapshot size is reduced from 381M to 65MB. Db loading/and snapshotting time is also reduced by 20%. |
314518 | No Perforce job exists for this issue. | 0 | 314862 | 46 weeks ago | 0|i1i9w7: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1650 | testServerCnxnExpiry failing consistently on solaris apache jenkins |
Bug | Resolved | Blocker | Duplicate | Rakesh Radhakrishnan | Patrick D. Hunt | Patrick D. Hunt | 20/Feb/13 12:30 | 16/Mar/14 12:37 | 16/Mar/14 08:21 | 3.5.0 | 3.5.0 | tests | 0 | 2 | ZOOKEEPER-1862 | testServerCnxnExpiry is failing consistently on solaris apache jenkins: https://builds.apache.org/view/S-Z/view/ZooKeeper/job/ZooKeeper-trunk-solaris/475/testReport/org.apache.zookeeper.test/ServerCnxnTest/testServerCnxnExpiry/ Seems to have started around the time the NIO multi-threading changes were introduced - but it's hard to say (some of the history has been lost already). Possibly just a bad test or timeouts not long enough... |
313725 | No Perforce job exists for this issue. | 0 | 314070 | 6 years, 1 week, 4 days ago | 0|i1i507: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1649 | Build RPM Package Error on CentOS 5 |
Bug | Resolved | Major | Won't Fix | Unassigned | Shining | Shining | 19/Feb/13 04:47 | 03/Mar/16 11:22 | 03/Mar/16 11:22 | 3.4.5 | build | 0 | 1 | CentOS 5.8 x86_64 JDK 1.6.0_21-b06 |
ant rpm -------------------- rpm: [copy] Copying 1 file to /tmp/zkpython_build_nshi/SOURCES [rpm] Building the RPM based on the zkpython.spec file [rpm] Executing(%prep): /bin/sh -e /var/tmp/rpm-tmp.62078 [rpm] Executing(%build): /bin/sh -e /var/tmp/rpm-tmp.62078 [rpm] Executing(%install): /bin/sh -e /var/tmp/rpm-tmp.62078 [rpm] [rpm] [rpm] RPM build errors: [rpm] + umask 022 [rpm] + cd /tmp/zkpython_build_nshi/BUILD [rpm] + LANG=C [rpm] + export LANG [rpm] + unset DISPLAY [rpm] + tar fxz /tmp/zkpython_build_nshi/SOURCES/ZooKeeper-0.4.linux-x86_64.tar.gz -C /tmp/zkpython_build_nshi/BUILD [rpm] + exit 0 [rpm] + umask 022 [rpm] + cd /tmp/zkpython_build_nshi/BUILD [rpm] + LANG=C [rpm] + export LANG [rpm] + unset DISPLAY [rpm] + exit 0 [rpm] + umask 022 [rpm] + cd /tmp/zkpython_build_nshi/BUILD [rpm] + LANG=C [rpm] + export LANG [rpm] + unset DISPLAY [rpm] + /bin/mv /tmp/zkpython_build_nshi/BUILD/usr /tmp/zkpython_build_nshi/BUILD [rpm] /bin/mv: `/tmp/zkpython_build_nshi/BUILD/usr' and `/tmp/zkpython_build_nshi/BUILD/usr' are the same file [rpm] error: Bad exit status from /var/tmp/rpm-tmp.62078 (%install) [rpm] Bad exit status from /var/tmp/rpm-tmp.62078 (%install) BUILD FAILED /home/nshi/workspace/zookeeper-3.4.5/build.xml:955: The following error occurred while executing this line: /home/nshi/workspace/zookeeper-3.4.5/src/contrib/build.xml:75: The following error occurred while executing this line: /home/nshi/workspace/zookeeper-3.4.5/src/contrib/zkpython/build.xml:144: '/usr/bin/rpmbuild' failed with exit code 1 ------------- |
313457 | No Perforce job exists for this issue. | 0 | 313802 | 4 years, 3 weeks ago | 0|i1i3cn: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1648 | Fix WatcherTest in JDK7 |
Bug | Closed | Minor | Fixed | Thawan Kooburat | Thawan Kooburat | Thawan Kooburat | 18/Feb/13 18:27 | 13/Mar/14 14:17 | 19/Feb/13 02:56 | 3.4.6, 3.5.0 | tests | 0 | 4 | ZOOKEEPER-1557, ZOOKEEPER-1147 | JDK7 run unit tests in random order causing intermittent WatcherTest failure. The fix is to clean up static variable that interfere with other tests. | 313408 | No Perforce job exists for this issue. | 2 | 313753 | 6 years, 2 weeks ago |
Reviewed
|
0|i1i31r: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1647 | OSGi package import/export changes not applied to bin-jar |
Bug | Closed | Major | Fixed | Arnoud Glimmerveen | Arnoud Glimmerveen | Arnoud Glimmerveen | 17/Feb/13 10:03 | 13/Mar/14 14:16 | 19/Feb/13 03:29 | 3.4.6, 3.5.0 | 3.4.6, 3.5.0 | 0 | 4 | ZOOKEEPER-1334, ZOOKEEPER-1645 | Two recent changes related to the OSGi headers Import-Package and Export-Package (ZOOKEEPER-1334 and ZOOKEEPER-1645) were only applied to the JAR created in ant target *jar*, leaving the JAR created in target *bin-jar* (to be uploaded to Maven central) with the old (incorrect) OSGi headers. | 313248 | No Perforce job exists for this issue. | 1 | 313593 | 6 years, 2 weeks ago |
Reviewed
|
0|i1i227: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1646 | mt c client tests fail on Ubuntu Raring |
Bug | Closed | Blocker | Fixed | Patrick D. Hunt | James Page | James Page | 12/Feb/13 05:07 | 13/Mar/14 14:17 | 17/Oct/13 13:08 | 3.4.5, 3.5.0 | 3.4.6, 3.5.0 | c client, tests | 0 | 5 | ZOOKEEPER-1742, ZOOKEEPER-1795 | Ubuntu 13.04 (raring), glibc 2.17 | Misc tests fail in the c client binding under the current Ubuntu development release: ./zktest-mt ZooKeeper server startedRunning Zookeeper_clientretry::testRetry ZooKeeper server started ZooKeeper server started : elapsed 9315 : OK Zookeeper_operations::testAsyncWatcher1 : assertion : elapsed 1054 Zookeeper_operations::testAsyncGetOperation : assertion : elapsed 1055 Zookeeper_operations::testOperationsAndDisconnectConcurrently1 : assertion : elapsed 1066 Zookeeper_operations::testOperationsAndDisconnectConcurrently2 : elapsed 0 : OK Zookeeper_operations::testConcurrentOperations1 : assertion : elapsed 1055 Zookeeper_init::testBasic : elapsed 1 : OK Zookeeper_init::testAddressResolution : elapsed 0 : OK Zookeeper_init::testMultipleAddressResolution : elapsed 0 : OK Zookeeper_init::testNullAddressString : elapsed 0 : OK Zookeeper_init::testEmptyAddressString : elapsed 0 : OK Zookeeper_init::testOneSpaceAddressString : elapsed 0 : OK Zookeeper_init::testTwoSpacesAddressString : elapsed 0 : OK Zookeeper_init::testInvalidAddressString1 : elapsed 0 : OK Zookeeper_init::testInvalidAddressString2 : elapsed 175 : OK Zookeeper_init::testNonexistentHost : elapsed 92 : OK Zookeeper_init::testOutOfMemory_init : elapsed 0 : OK Zookeeper_init::testOutOfMemory_getaddrs1 : elapsed 0 : OK Zookeeper_init::testOutOfMemory_getaddrs2 : elapsed 1 : OK Zookeeper_init::testPermuteAddrsList : elapsed 0 : OK Zookeeper_close::testIOThreadStoppedOnExpire : assertion : elapsed 1056 Zookeeper_close::testCloseUnconnected : elapsed 0 : OK Zookeeper_close::testCloseUnconnected1 : elapsed 91 : OK Zookeeper_close::testCloseConnected1 : assertion : elapsed 1056 Zookeeper_close::testCloseFromWatcher1 : assertion : elapsed 1076 Zookeeper_simpleSystem::testAsyncWatcherAutoReset ZooKeeper server started : elapsed 12155 : OK Zookeeper_simpleSystem::testDeserializeString : elapsed 0 : OK Zookeeper_simpleSystem::testNullData : elapsed 1031 : OK Zookeeper_simpleSystem::testIPV6 : elapsed 1005 : OK Zookeeper_simpleSystem::testPath : elapsed 1024 : OK Zookeeper_simpleSystem::testPathValidation : elapsed 1053 : OK Zookeeper_simpleSystem::testPing : elapsed 17287 : OK Zookeeper_simpleSystem::testAcl : elapsed 1019 : OK Zookeeper_simpleSystem::testChroot : elapsed 3052 : OK Zookeeper_simpleSystem::testAuth : assertion : elapsed 7010 Zookeeper_simpleSystem::testHangingClient : elapsed 1015 : OK Zookeeper_simpleSystem::testWatcherAutoResetWithGlobal ZooKeeper server started ZooKeeper server started ZooKeeper server started : elapsed 20556 : OK Zookeeper_simpleSystem::testWatcherAutoResetWithLocal ZooKeeper server started ZooKeeper server started ZooKeeper server started : elapsed 20563 : OK Zookeeper_simpleSystem::testGetChildren2 : elapsed 1041 : OK Zookeeper_multi::testCreate : elapsed 1017 : OK Zookeeper_multi::testCreateDelete : elapsed 1007 : OK Zookeeper_multi::testInvalidVersion : elapsed 1011 : OK Zookeeper_multi::testNestedCreate : elapsed 1009 : OK Zookeeper_multi::testSetData : elapsed 6019 : OK Zookeeper_multi::testUpdateConflict : elapsed 1014 : OK Zookeeper_multi::testDeleteUpdateConflict : elapsed 1007 : OK Zookeeper_multi::testAsyncMulti : elapsed 2001 : OK Zookeeper_multi::testMultiFail : elapsed 1006 : OK Zookeeper_multi::testCheck : elapsed 1020 : OK Zookeeper_multi::testWatch : elapsed 2013 : OK Zookeeper_watchers::testDefaultSessionWatcher1zktest-mt: tests/ZKMocks.cc:271: SyncedBoolCondition DeliverWatchersWrapper::isDelivered() const: Assertion `i<1000' failed. Aborted (core dumped) It would appear that the zookeeper connection does not transition to connected within the required time; I increased the time allowed but no change. Ubuntu raring has glibc 2.17; the test suite works fine on previous Ubuntu releases and this is the only difference that stood out. Interestingly the cli_mt worked just fine connecting to the same zookeeper instance that the tests left lying around so I'm assuming this is a test error rather than an actual bug. |
312414 | No Perforce job exists for this issue. | 1 | 312760 | 6 years, 2 weeks ago |
Reviewed
|
0|i1hwwv: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1645 | ZooKeeper OSGi package imports not complete |
Bug | Closed | Major | Fixed | Arnoud Glimmerveen | Arnoud Glimmerveen | Arnoud Glimmerveen | 12/Feb/13 02:08 | 13/Mar/14 14:16 | 15/Feb/13 19:50 | 3.4.6, 3.5.0 | 3.4.6, 3.5.0 | 0 | 5 | ZOOKEEPER-1647 | The ZooKeeper bundle relies on three packages it currently does not declare in the Import-Package MANIFEST header: {{javax.security.auth.callback}} , {{javax.security.auth.login}} and {{javax.security.sasl}} . By adding these the ZooKeeper jar will be a valid OSGi bundle. | 312388 | No Perforce job exists for this issue. | 1 | 312734 | 6 years, 2 weeks ago |
Reviewed
|
0|i1hwr3: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1644 | Add support for compressed SetWatches packet |
Improvement | Open | Major | Unresolved | Unassigned | Thawan Kooburat | Thawan Kooburat | 11/Feb/13 17:13 | 15/Feb/13 19:42 | c client, java client, server | 0 | 3 | ZOOKEEPER-706 | On reconnect with a server to restore its session, a client have to send all watched paths via SetWatches packet to the server. This can be potentially large and exceeded server-side buffer (jute.maxbuffer) causing the session to fail. We have 2 concerns. 1. We can increase jute.maxbuffer to arbitrarily size as a simple workaround, but, in our use case, the number of watch is going to keep growing 2. If a large number of clients get disconnected at once, the server may receive a large amount data over network because of the flood of SetWatches packet. In our case, the watch paths should by highly compressible. So our current plan is to add a new type of request which is a compressed set watch request. It should be possible to support multiple compression schemes. We are probably going to use snappy compression but may add support for gzip as a default to minimize external dependency requirement. Feel free to comment if you have any suggestion. |
312325 | No Perforce job exists for this issue. | 0 | 312671 | 7 years, 5 weeks, 5 days ago | 0|i1hwd3: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1643 | Windows: fetch_and_add not 64bit-compatible, may not be correct |
Bug | Resolved | Major | Fixed | Erik Anderson | Erik Anderson | Erik Anderson | 08/Feb/13 20:44 | 01/Aug/17 11:42 | 20/Feb/13 00:26 | 3.3.3 | 3.5.0, 3.4.11 | c client | 0 | 6 | BIGTOP-2082 | Windows 7 Microsoft Visual Studio 2005 |
Note: While I am using a really old version of ZK, I did do enough "SVN Blame" operations to realize that this code hasn't changed. I am currently attempting to compile the C client under MSVC 2005 arch=x64. There are three things I can see with fetch_and_add() inside of /src/c/src/mt_adapter.c (1) MSVC 2005 64bit will not compile inline _asm sections. I'm moderately sure this code is x86-specific so I'm unsure whether it should attempt to either. (2) The Windows intrinsic InterlockedExchangeAdd [http://msdn.microsoft.com/en-us/library/windows/desktop/ms683597(v=vs.85).aspx] appears to do the same thing this code is attempting to do (3) I'm really rusty on my assembly, but why are we doing two separate XADD operations here, and is the code as-written anything approaching atomicity? If you want an official patch I likely can do an SVN checkout and submit a patch the replaces the entire #else on lines 495-505 with a "return InterlockedExchangeAdd(operand, incr);" Usually when I'm scratching my head this badly there's something I'm missing though. As far as I can tell there has been no prior discussion on this code. |
312042 | No Perforce job exists for this issue. | 1 | 312388 | 2 years, 33 weeks, 2 days ago | 0|i1hum7: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1642 | Leader loading database twice |
Bug | Closed | Major | Fixed | Flavio Paiva Junqueira | Flavio Paiva Junqueira | Flavio Paiva Junqueira | 08/Feb/13 05:30 | 13/Mar/14 14:17 | 16/May/13 13:33 | 3.4.6, 3.5.0 | 0 | 7 | The leader server currently loads the database before running leader election when trying to figure out the zxid it needs to use for the election and again when it starts leading. This is problematic for larger databases so we should remove the redundant load if possible. The code references are: # getLastLoggedZxid() in QuorumPeer; # loadData() in ZooKeeperServer. |
311917 | No Perforce job exists for this issue. | 2 | 312263 | 6 years, 2 weeks ago | 0|i1htun: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1641 | Using slope=positive results in a jagged ganglia graph of packets rcvd/sent |
Bug | Resolved | Minor | Fixed | Ben Hartshorne | Ben Hartshorne | Ben Hartshorne | 06/Feb/13 13:17 | 16/Feb/13 06:02 | 15/Feb/13 20:00 | 3.5.0 | contrib | 0 | 3 | The ganglia python module uses 'slope=positive' when submitting zk_packets_received and zk_packets_sent. This results in a graph that is jagged (alternating valid results with zeros) at the highest resolution and under-represents the actual value at all averaged resolutions (>1hr). The module should be changed to calculate the delta in requests and report requests per second instead. |
311603 | No Perforce job exists for this issue. | 1 | 311949 | 7 years, 5 weeks, 5 days ago |
Reviewed
|
0|i1hrwv: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1640 | dynamically load command objects in zk |
Improvement | Resolved | Minor | Not A Problem | Tian Hong Wang | Tian Hong Wang | Tian Hong Wang | 05/Feb/13 04:21 | 26/Feb/13 03:49 | 26/Feb/13 03:49 | 3.4.5 | java client | 0 | 3 | In class org.apache.zookeeper.ZooKeeperMain.java, new CloseCommand().addToMap(commandMapCli); new CreateCommand().addToMap(commandMapCli); new DeleteCommand().addToMap(commandMapCli); new DeleteAllCommand().addToMap(commandMapCli); // Depricated: rmr new DeleteAllCommand("rmr").addToMap(commandMapCli); new SetCommand().addToMap(commandMapCli); new GetCommand().addToMap(commandMapCli); new LsCommand().addToMap(commandMapCli); new Ls2Command().addToMap(commandMapCli); new GetAclCommand().addToMap(commandMapCli); new SetAclCommand().addToMap(commandMapCli); new StatCommand().addToMap(commandMapCli); new SyncCommand().addToMap(commandMapCli); new SetQuotaCommand().addToMap(commandMapCli); new ListQuotaCommand().addToMap(commandMapCli); new DelQuotaCommand().addToMap(commandMapCli); new AddAuthCommand().addToMap(commandMapCli); The above code is not flexible for command object scalability. It's better to refine the code to load and create the command objects dynamically. |
patch | 311332 | No Perforce job exists for this issue. | 1 | 311678 | 7 years, 4 weeks, 2 days ago | 0|i1hq8n: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1639 | zk.getZKDatabase().deserializeSnapshot adds new system znodes instead of replacing existing ones |
Bug | Open | Major | Unresolved | Unassigned | Alexander Shraer | Alexander Shraer | 02/Feb/13 22:05 | 08/Oct/13 16:31 | 3.4.5 | 0 | 3 | Before the call to zk.getZKDatabase().deserializeSnapshot in Learner.java, zk.getZKDatabase().getDataTree().getNode("/zookeeper") == zk.getZKDatabase().getDataTree().procDataNode, which means that this is the same znode, as it should be. However, after this call, they are not equal. The node actually being used in client operations is zk.getZKDatabase().getDataTree().getNode("/zookeeper"), but the other old node procDataNode is still there and not replaced (in fact it is a final field). |
311056 | No Perforce job exists for this issue. | 0 | 311401 | 6 years, 24 weeks, 2 days ago | 0|i1hoj3: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1638 | Redundant zk.getZKDatabase().clear(); |
Improvement | Resolved | Trivial | Fixed | neil bhakta | Alexander Shraer | Alexander Shraer | 02/Feb/13 18:02 | 12/Mar/14 20:39 | 12/Mar/14 18:58 | 3.5.0 | 0 | 7 | Learner.syncWithLeader calls zk.getZKDatabase().clear() right before zk.getZKDatabase().deserializeSnapshot(leaderIs); Then the first thing deserializeSnapshot does is another clear(). Suggest to remove the clear() in syncWithLeader. |
newbie | 311044 | No Perforce job exists for this issue. | 2 | 311389 | 6 years, 2 weeks ago | 0|i1hogf: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1637 | Intermittent Segfault with zkpython in pyzoo_exists |
Bug | Open | Major | Unresolved | Unassigned | Robert Schultheis | Robert Schultheis | 01/Feb/13 15:29 | 03/Feb/13 02:32 | 3.4.3, 3.4.4, 3.4.5 | 0 | 1 | We are getting an intermittent segfault. This is OSX, zookeeper compiled using brew. I've tried 3.4.3 - 3.4.5. I used GDB to get the following backtrace: {code} Program received signal EXC_BAD_ACCESS, Could not access memory. Reason: 13 at address: 0x0000000000000000 [Switching to process 10366 thread 0x1d03] 0x00007fff8e0984f0 in strlen () (gdb) backtrace #0 0x00007fff8e0984f0 in strlen () #1 0x00000001004983cc in prepend_string () #2 0x0000000100498451 in Request_path_init () #3 0x0000000100499e94 in zoo_awexists () #4 0x000000010049a036 in zoo_wexists () #5 0x000000010048170b in pyzoo_exists () #6 0x000000010008c5d8 in PyEval_EvalFrameEx () #7 0x000000010008ecd8 in PyEval_EvalCodeEx () #8 0x000000010008ee6c in PyEval_EvalCode () #9 0x000000010008be0a in PyEval_EvalFrameEx () #10 0x000000010008ecd8 in PyEval_EvalCodeEx () #11 0x000000010008ee6c in PyEval_EvalCode () #12 0x000000010008be0a in PyEval_EvalFrameEx () #13 0x000000010008ecd8 in PyEval_EvalCodeEx () #14 0x000000010002cabf in PyClassMethod_New () #15 0x000000010000bd32 in PyObject_Call () #16 0x000000010008c5ec in PyEval_EvalFrameEx () #17 0x000000010008ecd8 in PyEval_EvalCodeEx () #18 0x000000010002cabf in PyClassMethod_New () #19 0x000000010000bd32 in PyObject_Call () #20 0x000000010001a6e9 in PyInstance_New () #21 0x000000010000bd32 in PyObject_Call () #22 0x0000000100055c5d in _PyObject_SlotCompare () #23 0x000000010000bd32 in PyObject_Call () #24 0x000000010008bf63 in PyEval_EvalFrameEx () #25 0x000000010008ecd8 in PyEval_EvalCodeEx () #26 0x000000010008ee6c in PyEval_EvalCode () #27 0x000000010008be0a in PyEval_EvalFrameEx () #28 0x000000010008edf7 in PyEval_EvalCode () #29 0x000000010008be0a in PyEval_EvalFrameEx () #30 0x000000010008ecd8 in PyEval_EvalCodeEx () #31 0x000000010002cabf in PyClassMethod_New () #32 0x000000010000bd32 in PyObject_Call () #33 0x000000010001a6e9 in PyInstance_New () #34 0x000000010000bd32 in PyObject_Call () #35 0x0000000100087c40 in PyEval_CallObjectWithKeywords () #36 0x00000001000b940d in initthread () #37 0x00007fff8e0448bf in _pthread_start () #38 0x00007fff8e047b75 in thread_start () {code} |
310930 | No Perforce job exists for this issue. | 0 | 311275 | 7 years, 7 weeks, 4 days ago | 0|i1hnr3: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1636 | c-client crash when zoo_amulti failed |
Bug | Closed | Critical | Fixed | Michael K. Edwards | Thawan Kooburat | Thawan Kooburat | 30/Jan/13 22:54 | 01/Aug/19 20:50 | 10/Dec/18 09:29 | 3.4.3 | 3.6.0, 3.5.5, 3.4.15 | c client | 0 | 5 | 0 | 9000 | deserialize_response for multi operation don't handle the case where the server fail to send back response. (Eg. when multi packet is too large) c-client will try to process completion of all sub-request as if the operation is successful and will eventually cause SIGSEGV |
100% | 100% | 9000 | 0 | pull-request-available | 310569 | No Perforce job exists for this issue. | 5 | 310914 | 1 year, 14 weeks, 3 days ago | 0|i1hljb: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1635 | ZooKeeper C client doesn't compile on 64 bit Windows |
Improvement | Resolved | Major | Invalid | Unassigned | Tomas Gutierrez | Tomas Gutierrez | 30/Jan/13 16:02 | 24/Apr/14 17:15 | 23/Apr/14 18:02 | 3.5.0 | 10 | 8 | Windows x64 systems. | x64 target does not support _asm inline (See: http://msdn.microsoft.com/en-us/library/4ks26t93(v=vs.80).aspx) The proposal is to use native windows function which still valid for i386 and x64 architecture. In order to avoid any potential break, a compilation directive has been added. But, the best should be the removal of the asm part. ----------- sample code ----------- int32_t fetch_and_add(volatile int32_t* operand, int incr) { #ifndef WIN32 int32_t result; asm __volatile__( "lock xaddl %0,%1\n" : "=r"(result), "=m"(*(int *)operand) : "0"(incr) : "memory"); return result; #else #ifdef WIN32_NOASM InterlockedExchangeAdd(operand, incr); return *operand; #else volatile int32_t result; _asm { mov eax, operand; //eax = v; mov ebx, incr; // ebx = i; mov ecx, 0x0; // ecx = 0; lock xadd dword ptr [eax], ecx; lock xadd dword ptr [eax], ebx; mov result, ecx; // result = ebx; } return result;*/ #endif #endif } |
310494 | No Perforce job exists for this issue. | 0 | 310839 | 5 years, 48 weeks ago | 0|i1hl2n: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1634 | A new feature proposal to ZooKeeper: authentication enforcement |
New Feature | Resolved | Major | Fixed | Michael Han | Jaewoong Choi | Jaewoong Choi | 30/Jan/13 14:35 | 26/Sep/19 19:44 | 24/Jul/19 12:01 | 3.4.5 | 3.6.0 | security, server | 4 | 14 | 259200 | 243000 | 16200 | 6% | ZOOKEEPER-2462, ZOOKEEPER-3561, ZOOKEEPER-2526 | Up to the version of 3.4.5, ZooKeeperServer doesn't force the authentication if the client doesn't give any auth-info through ZooKeeper#addAuthInfo method invocation. Hence, every znode should have at least one ACL assigned otherwise any unauthenticated client can do anything on it. The current authentication/authorization mechanism of ZooKeeper described above has several points at issue: 1. At security standpoint, a maleficent client can access a znode which doesn't have any proper authorization access control set. 2. At runtime performance standpoint, authorization for every znode to every operation is unnecessarily but always evaluated against the client who bypassed the authentication phase. In other words, the current mechanism doesn't address a certain requirement at below: "We want to protect a ZK server by enforcing a simple authentication to every client no matter which znode it is trying to access. Every connection (or operation) from the client won't be established but rejected if it doesn't come with a valid authentication information. As we don't have any other distinction between znodes in term of authorization, we don't want any ACLs on any znode." To address the issues mentioned above, we propose a feature called "authentication enforcement" to the ZK source. The idea is roughly but clearly described in a form of patch in the attached file (zookeeper_3.4.5_patch_for_authentication_enforcement.patch): which makes ZooKeeperServer enforce the authentication with the given 2 configurations: authenticationEnforced (boolean) and enforcedAuthenticationScheme (string) against every operation coming through ZooKeeperServer#processPacket method except for OpCode.auth operation. The repository base of the patch is "http://svn.apache.org/repos/asf/zookeeper/tags/release-3.4.5/" |
6% | 6% | 16200 | 243000 | 259200 | pull-request-available | 310474 | No Perforce job exists for this issue. | 1 | 310819 | 34 weeks ago | authentication | 0|i1hky7: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1633 | Introduce a protocol version to connection initiation message |
Bug | Closed | Major | Fixed | Alexander Shraer | Alexander Shraer | Alexander Shraer | 30/Jan/13 14:14 | 13/Mar/14 14:16 | 02/Apr/13 02:32 | 3.4.6 | server | 0 | 6 | ZOOKEEPER-107 | Currently the first message a server sends to another server includes just one field - the server's id (long). This is in QuorumCnxManager.java. This makes changes to the information passed during this initial connection very difficult. This patch will change the first field of the message to be a protocol version (a negative number that can't be a server id). The second field will be the server id. The third field is number of bytes in the remainder of the message. A 3.4 server will read the first field as before, but if this is a negative number it will read the second field to find the server id, and then remove the remainder of the message from the stream. This will not affect 3.4 since 3.4 and earlier servers send just the server id (so the code in the patch will not run unless there is a server > 3.4 trying to connect). This will, however, provide the necessary flexibility for future releases as well as an upgrade path from 3.4 | 310464 | No Perforce job exists for this issue. | 5 | 310809 | 6 years, 2 weeks ago | 0|i1hkvz: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1632 | fix memory leaks in cli_st |
Bug | Closed | Minor | Fixed | Flavio Paiva Junqueira | Colin McCabe | Colin McCabe | 29/Jan/13 17:49 | 13/Mar/14 14:17 | 04/Dec/13 05:29 | 3.4.6, 3.5.0 | c client | 0 | 6 | ZOOKEEPER-1556 | Fix two memory leaks revealed by running: {code} valgrind --leak-check=full ./.libs/cli_st 127.0.0.1:2182 create /foo quit {code} |
310304 | No Perforce job exists for this issue. | 4 | 310649 | 6 years, 2 weeks ago | The fix for this issue solves the memory leak spotted in the absence of errors. In the case the completion function is not registered because of an error (e.g., see zoo_async), the line duplicate won't be freed. | 0|i1hjwf: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1631 | cppunit test TestOperations.cc fails |
Bug | Open | Minor | Unresolved | Unassigned | Colin McCabe | Colin McCabe | 29/Jan/13 14:19 | 29/Jan/13 14:19 | 3.4.6 | 0 | 1 | I tried running "make run-check" on the cppunit tests, and got the following error: {code} tests/TestOperations.cc:270: Assertion: equality assertion failed [Expected: 1, Actual : 0] tests/TestOperations.cc:339: Assertion: assertion failed [Expression: timeMock==zh->last_recv] tests/TestOperations.cc:407: Assertion: equality assertion failed [Expected: 1, Actual : 0] tests/TestOperations.cc:212: Assertion: equality assertion failed [Expected: -7, Actual : 0] {code} I thought this might be an environment issue, but I was able to reproduce it on both Ubuntu 12.04 and OpenSUSE 12.1 |
310250 | No Perforce job exists for this issue. | 0 | 310595 | 7 years, 8 weeks, 2 days ago | 0|i1hjkf: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1630 | collect the zk connects/disconnects every cycle and report it to controller. |
Improvement | Resolved | Major | Invalid | Unassigned | kishore gopalakrishna | kishore gopalakrishna | 25/Jan/13 19:25 | 25/Jan/13 19:28 | 25/Jan/13 19:28 | 0 | 1 | Helix agent must collect the zk connects/disconnects and use the health check framework to convey the information. COntroller must disable the nodes are connecting/disconnecting frequently | 309684 | No Perforce job exists for this issue. | 0 | 302835 | 7 years, 8 weeks, 6 days ago | 0|i1g7nz: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1629 | testTransactionLogCorruption occasionally fails |
Bug | Closed | Major | Fixed | Alexander Shraer | Alexander Shraer | Alexander Shraer | 24/Jan/13 20:10 | 13/Mar/14 14:17 | 14/Jul/13 22:07 | 3.4.6, 3.5.0 | tests | 0 | 10 | It seems that testTransactionLogCorruption is very flaky,for example fails here: https://builds.apache.org/job/ZooKeeper-trunk-jdk7/500/ https://builds.apache.org/job/ZooKeeper-trunk-jdk7/502/ https://builds.apache.org/job/ZooKeeper-trunk-jdk7/503/#showFailuresLink also fails for older builds (no longer on the website), for example all builds from 381 to 399. |
309042 | No Perforce job exists for this issue. | 6 | 289702 | 6 years, 2 weeks ago | 0|i1dylj: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1628 | Documented list of allowable characters in ZK doc not in line with code |
Bug | Resolved | Major | Fixed | Gabriel Reid | Gabriel Reid | Gabriel Reid | 24/Jan/13 09:19 | 25/Jan/13 02:09 | 25/Jan/13 02:09 | 3.5.0 | documentation, java client | 0 | 2 | The documented set of allowable characters in ZooKeeper node names in the Programmer's Guide is not entirely in line with the code. The range of non-printable ASCII characters in the doc ends too early (i.e. 0x19 instead of 0x1F). The range checking code in PathUtils also includes off-by-one errors, so that characters that are on the border of being unallowable are actually allowed by the code. |
308805 | No Perforce job exists for this issue. | 1 | 284551 | 7 years, 8 weeks, 6 days ago |
Reviewed
|
0|i1d3mn: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1627 | Add org.apache.zookeeper.common to exported packages in OSGi MANIFEST headers |
Improvement | Closed | Major | Fixed | Arnoud Glimmerveen | Arnoud Glimmerveen | Arnoud Glimmerveen | 24/Jan/13 02:53 | 13/Mar/14 14:17 | 09/Oct/13 18:14 | 3.4.5 | 3.4.6, 3.5.0 | 4 | 5 | Java: 1.6.0_31 OSGi environment: Karaf 2.3.0 |
The utilities contained in the org.apache.zookeeper.common package are not part of the exported packages in an OSGi environment, thus making them not available to other bundles using ZooKeeper. Propose to add the org.apache.zookeeper.common package to the Export-Package MANIFEST header. |
308637 | No Perforce job exists for this issue. | 2 | 282489 | 6 years, 2 weeks ago |
Reviewed
|
0|i1cqwf: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1626 | ZOOKEEPER-1366 Zookeeper C client should be tolerant of clock adjustments |
Sub-task | Resolved | Major | Fixed | Colin McCabe | Colin McCabe | Colin McCabe | 21/Jan/13 14:10 | 28/Aug/17 07:30 | 20/Jun/15 19:58 | 3.5.1, 3.6.0 | c client | 1 | 13 | ZOOKEEPER-2516, ZOOKEEPER-1366, ZOOKEEPER-2178 | The Zookeeper C client should use monotonic time when available, in order to be more tolerant of time adjustments. | 307051 | No Perforce job exists for this issue. | 7 | 267152 | 2 years, 29 weeks, 3 days ago | 0|i1a48f: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1625 | zkServer.sh is looking for clientPort in config file, but it may no longer be there with ZK-1411 |
Bug | Resolved | Major | Fixed | Alexander Shraer | Alexander Shraer | Alexander Shraer | 19/Jan/13 16:33 | 23/Jan/13 14:24 | 22/Jan/13 21:56 | 3.5.0 | 3.5.0 | scripts | 0 | 4 | ZOOKEEPER-1411, ZOOKEEPER-107 | zkServer.sh is currently looking for "clientPort" entry in the static configuration file and uses it to contact the server. With ZOOKEEPER-1411 clientPort is part of the dynamic configuration, and may appear in the separate dynamic configuration file. The "clientPort" entry may no longer be in the static config file. With the proposed patch zkServer.sh first looks in the old (static) config file, then if clientPort is not there, it figures out the id of the server by looking at myid file, and then using that id finds the client port in the dynamic config file. |
305583 | No Perforce job exists for this issue. | 1 | 257309 | 7 years, 9 weeks, 1 day ago | 0|i18fh3: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1624 | PrepRequestProcessor abort multi-operation incorrectly |
Bug | Closed | Critical | Fixed | Thawan Kooburat | Thawan Kooburat | Thawan Kooburat | 17/Jan/13 22:11 | 13/Mar/14 14:16 | 10/Oct/13 15:06 | 3.4.6, 3.5.0 | server | 0 | 6 | We found this issue when trying to issue multiple instances of the following multi-op concurrently multi { 1. create sequential node /a- 2. create node /b } The expected result is that only the first multi-op request should success and the rest of request should fail because /b is already exist However, the reported result is that the subsequence multi-op failed because of sequential node creation failed which is not possible. Below is the return code for each sub-op when issuing 3 instances of the above multi-op asynchronously 1. ZOK, ZOK 2. ZOK, ZNODEEXISTS, 3. ZNODEEXISTS, ZRUNTIMEINCONSISTENCY, When I added more debug log. The cause is that PrepRequestProcessor rollback outstandingChanges of the second multi-op incorrectly causing sequential node name generation to be incorrect. Below is the sequential node name generated by PrepRequestProcessor 1. create /a-0001 2. create /a-0003 3. create /a-0001 The bug is getPendingChanges() method. In failed to copied ChangeRecord for the parent node ("/"). So rollbackPendingChanges() cannot restore the right previous change record of the parent node when aborting the second multi-op The impact of this bug is that sequential node creation on the same parent node may fail until the previous one is committed. I am not sure if there is other implication or not. |
zk-review | 305000 | No Perforce job exists for this issue. | 6 | 254379 | 6 years, 2 weeks ago | 0|i17xdz: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1623 | Authentication using SASL |
Bug | Open | Major | Unresolved | Unassigned | Christian Wuertz | Christian Wuertz | 16/Jan/13 11:46 | 22/Oct/13 14:39 | 3.4.5 | 1 | 3 | ZOOKEEPER-1512, ZOOKEEPER-1550, ZOOKEEPER-1657, ZOOKEEPER-1510 | First of all, I'm just running some test and thus I don't wan't/need any authentication at all. So I didn't configured any. But running my Java client with an Oracle JVM (1.6.38) I run into the following problem: `2013-01-16 17:40:30,659 [main] INFO org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=192.168.2.28:2181 sessionTimeout=5000 watcher=master.Master@eb42cbf 2013-01-16 17:40:30,674 [main] DEBUG org.apache.zookeeper.ClientCnxn - zookeeper.disableAutoWatchReset is false 2013-01-16 17:40:30,698 [Thread-0] DEBUG master.Master - Master waits... 2013-01-16 17:40:30,701 [main-SendThread(Teots-PC:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server Teots-PC/192.168.2.28:2181. Will not attempt to authenticate using SASL (Unable to locate a login configuration) 2013-01-16 17:40:30,706 [main-SendThread(Teots-PC:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to Teots-PC/192.168.2.28:2181, initiating session 2013-01-16 17:40:30,708 [main-SendThread(Teots-PC:2181)] DEBUG org.apache.zookeeper.ClientCnxn - Session establishment request sent on Teots-PC/192.168.2.28:2181 2013-01-16 17:40:30,709 [main-SendThread(Teots-PC:2181)] DEBUG org.apache.zookeeper.client.ZooKeeperSaslClient - Could not retrieve login configuration: java.lang.SecurityException: Unable to locate a login configuration 2013-01-16 17:40:30,730 [main-SendThread(Teots-PC:2181)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server Teots-PC/192.168.2.28:2181, sessionid = 0x13c44254fd70003, negotiated timeout = 5000 2013-01-16 17:40:30,732 [main-EventThread] DEBUG master.Master - Master recieved an event: None 2013-01-16 17:40:30,732 [main-SendThread(Teots-PC:2181)] DEBUG org.apache.zookeeper.client.ZooKeeperSaslClient - Could not retrieve login configuration: java.lang.SecurityException: Unable to locate a login configuration 2013-01-16 17:40:30,732 [main-EventThread] DEBUG master.Master - Master's state: SyncConnected 2013-01-16 17:40:30,732 [main-SendThread(Teots-PC:2181)] DEBUG org.apache.zookeeper.client.ZooKeeperSaslClient - Could not retrieve login configuration: java.lang.SecurityException: Unable to locate a login configuration` This does not happen with an OpenJDK JVM. |
304707 | No Perforce job exists for this issue. | 0 | 254068 | 6 years, 22 weeks, 2 days ago | 0|i17vh3: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1622 | session ids will be negative in the year 2022 |
Bug | Closed | Trivial | Fixed | Eric C. Newton | Eric C. Newton | Eric C. Newton | 16/Jan/13 11:29 | 13/Mar/14 14:16 | 16/Dec/13 01:30 | 3.4.0, 3.5.0 | 3.4.6, 3.5.0 | 0 | 5 | Someone decided to use a large number for their myid file. This cause session ids to go negative, and our software (Apache Accumulo) did not handle this very well. While diagnosing the problem, I noticed this in SessionImpl: {noformat} public static long initializeNextSession(long id) { long nextSid = 0; nextSid = (System.currentTimeMillis() << 24) >> 8; nextSid = nextSid | (id <<56); return nextSid; } {noformat} When the 40th bit in System.currentTimeMillis() is a one, sign extension will fill the upper 8 bytes of nextSid, and id will not make the session id unique. I recommend changing the right shift to the logical shift: {noformat} public static long initializeNextSession(long id) { long nextSid = 0; nextSid = (System.currentTimeMillis() << 24) >>> 8; nextSid = nextSid | (id <<56); return nextSid; } {noformat} But, we have until the year 2022 before we have to worry about it. |
304699 | No Perforce job exists for this issue. | 1 | 253877 | 6 years, 2 weeks ago |
Reviewed
|
0|i17uan: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1621 | ZooKeeper does not recover from crash when disk was full |
Bug | Patch Available | Major | Unresolved | Michi Mutsuzaki | David Arthur | David Arthur | 16/Jan/13 10:24 | 05/Feb/20 07:11 | 3.4.3 | 3.7.0, 3.5.8 | server | 7 | 26 | 0 | 7200 | Ubuntu 12.04, Amazon EC2 instance | The disk that ZooKeeper was using filled up. During a snapshot write, I got the following exception 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - Severe unrecoverable error, exiting java.io.IOException: No space left on device at java.io.FileOutputStream.writeBytes(Native Method) at java.io.FileOutputStream.write(FileOutputStream.java:282) at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123) at org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309) at org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306) at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484) at org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162) at org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101) Then many subsequent exceptions like: 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was partial. 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected exception, exiting abnormally java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:375) at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) at org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.<init>(FileTxnLog.java:504) at org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341) at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130) at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) at org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259) at org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386) at org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138) at org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112) at org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) at org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:52) at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:116) at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) It seems to me that writing the transaction log should be fully atomic to avoid such situations. Is this not the case? |
100% | 100% | 7200 | 0 | pull-request-available | 304627 | No Perforce job exists for this issue. | 3 | 252823 | 1 year, 13 weeks, 3 days ago | 0|i17nsf: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1620 | NIOServerCnxnFactory (new code introduced in ZK-1504) opens selectors but never closes them |
Bug | Resolved | Major | Fixed | Thawan Kooburat | Alexander Shraer | Alexander Shraer | 14/Jan/13 23:18 | 01/May/13 22:30 | 25/Jan/13 01:47 | 3.5.0 | 3.5.0 | server | 0 | 4 | ZOOKEEPER-107, ZOOKEEPER-1504 | New code (committed in ZK-1504) opens selectors but doesn't close them. Specifically AbstractSelectThread in its constructor does this.selector = Selector.open(); But possibly also elsewhere. Tests fail for me with the following message: java.io.IOException: Too many open files at sun.nio.ch.EPollArrayWrapper.epollCreate(Native Method) at sun.nio.ch.EPollArrayWrapper.<init>(EPollArrayWrapper.java:69) at sun.nio.ch.EPollSelectorImpl.<init>(EPollSelectorImpl.java:52) at sun.nio.ch.EPollSelectorProvider.openSelector(EPollSelectorProvider.java:18) at java.nio.channels.Selector.open(Selector.java:209) at org.apache.zookeeper.server.NIOServerCnxnFactory$AbstractSelectThread.<init>(NIOServerCnxnFactory.java:128) at org.apache.zookeeper.server.NIOServerCnxnFactory$AcceptThread.<init>(NIOServerCnxnFactory.java:177) at org.apache.zookeeper.server.NIOServerCnxnFactory.configure(NIOServerCnxnFactory.java:663) at org.apache.zookeeper.server.ServerCnxnFactory.createFactory(ServerCnxnFactory.java:127) at org.apache.zookeeper.server.quorum.QuorumPeer.<init>(QuorumPeer.java:709) at org.apache.zookeeper.test.QuorumBase.startServers(QuorumBase.java:177) at org.apache.zookeeper.test.QuorumBase.setUp(QuorumBase.java:113) at org.apache.zookeeper.test.QuorumBase.setUp(QuorumBase.java:71) at org.apache.zookeeper.test.ReconfigTest.setUp(ReconfigTest.java:56) |
304375 | No Perforce job exists for this issue. | 2 | 252551 | 7 years, 8 weeks, 6 days ago |
Reviewed
|
0|i17m3z: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1619 | Allow spaces in URL |
Improvement | Resolved | Minor | Fixed | Edward Ribeiro | Todd Nine | Todd Nine | 11/Jan/13 10:57 | 25/Jan/13 01:55 | 25/Jan/13 01:55 | 3.4.5, 3.5.0 | 3.5.0 | java client | 0 | 3 | Currently, spaces are not allowed in the url. This format will work. {code} 10.10.1.1:2181,10.10.1.2:2181/usergrid {code} This format will not (notice the spaces around the comma) {code} 10.10.1.1:2181 , 10.10.1.2:2181/usergrid {code} Please add a trim to both the port and the hostname parsing. |
303965 | No Perforce job exists for this issue. | 2 | 251750 | 7 years, 8 weeks, 6 days ago |
Reviewed
|
0|i17h5z: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1618 | Disconnected event when stopping leader process |
Improvement | Open | Minor | Unresolved | Unassigned | Peter Nerg | Peter Nerg | 09/Jan/13 06:15 | 26/Feb/13 05:35 | 3.4.4, 3.4.5 | documentation | 1 | 4 | Linux SLES java version "1.6.0_31" |
Running a three node ZK cluster I stop/kill the leader node. Immediately all connected clients will receive a Disconnected event, a second or so later an event with SyncConnected is received. Killing a follower will not produce the same issue/event. The application/clients have been implemented to manage Disconnected events so they survive. I however expected the ZK client to manage the hickup during the election process. This produces quite a lot of logging in large clusters that have many services relying on ZK. In some cases we may loose a few requests as we need a working ZK cluster to execute those requests. IMHO it's not really full high availability if the ZK cluster momentarily takes a dive due to that the leader goes away. No matter how much redundancy one uses in form of ZK instances one still may get processing errors during leader election. I've verified this behavior in both 3.4.4 and 3.4.5 |
303363 | No Perforce job exists for this issue. | 0 | 250703 | 7 years, 4 weeks, 2 days ago | 0|i17apj: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1617 | zookeeper version error log info ? |
Bug | Open | Major | Unresolved | Unassigned | wangwei | wangwei | 09/Jan/13 05:46 | 22/Jan/13 17:40 | 0 | 2 | 2012-12-31 10:51:41,562-[TS] INFO main-EventThread org.I0Itec.zkclient.ZkClient - zookeeper state changed (Disconnected) 2012-12-31 10:51:43,008-[TS] INFO main-SendThread(17.22.17.1:2181) org.apache.zookeeper.ClientCnxn - Opening socket connection to server /17.22.17.1:2181. Will not attempt to authenticate using SASL (unknown error) 2012-12-31 10:51:43,009-[TS] INFO main-SendThread(17.22.17.1:2181) org.apache.zookeeper.ClientCnxn - Socket connection established to /17.22.17.1:2181, initiating session 2012-12-31 10:51:43,011-[TS] WARN main-SendThread(17.22.17.1:2181) org.apache.zookeeper.ClientCnxnSocket - Connected to an old server; r-o mode will be unavailable 2012-12-31 10:51:43,011-[TS] INFO main-SendThread(17.22.17.1:2181) org.apache.zookeeper.ClientCnxn - Session establishment complete on server /17.22.17.1:2181, sessionid = 0x13b8a23254100be, negotiated timeout = 6000 2012-12-31 10:51:43,012-[TS] INFO main-EventThread org.I0Itec.zkclient.ZkClient - zookeeper state changed (SyncConnected) 2012-12-31 10:51:47,012-[TS] INFO main-SendThread(17.22.17.1:2181) org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 4002ms for sessionid 0x13b8a23254100be, closing socket connection and attempting reconnect zookeeper client is 3.4.4 zookeeper server is 3.3.4 user 3.4.4 client connection 3.3.4 server |
303359 | No Perforce job exists for this issue. | 0 | 250699 | 7 years, 9 weeks, 2 days ago | 0|i17aon: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1616 | time calculations should use a monotonic clock |
Bug | Resolved | Major | Duplicate | Unassigned | Todd Lipcon | Todd Lipcon | 08/Jan/13 19:35 | 11/Apr/15 17:44 | 11/Apr/15 17:44 | 0 | 9 | ZOOKEEPER-1366 | We recently had an issue with ZooKeeper sessions acting strangely due to a bad NTP setup on a set of hosts. Looking at the code, ZK seems to use System.currentTimeMillis to measure durations or intervals in many places. This is bad since that time can move backwards or skip ahead by several minutes. Instead, it should use System.nanoTime (or a wrapper such as Guava's Stopwatch class) | 303295 | No Perforce job exists for this issue. | 0 | 250468 | 4 years, 49 weeks, 5 days ago | 0|i1799b: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1615 | minor typos in ZooKeeper Programmer's Guide web page |
Improvement | Closed | Trivial | Fixed | Evan Zacks | Evan Zacks | Evan Zacks | 07/Jan/13 15:51 | 13/Mar/14 14:17 | 25/Jan/13 02:17 | 3.4.5 | 3.4.6, 3.5.0 | documentation | 0 | 3 | There are some minor typos and misspellings in the Programmer's Guide web page. | documentation | 303005 | No Perforce job exists for this issue. | 1 | 250118 | 6 years, 2 weeks ago |
Reviewed
|
0|i1773j: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1614 | zoo_multi c MT client windows crash |
Bug | Open | Major | Unresolved | Unassigned | Richard Dermer | Richard Dermer | 02/Jan/13 18:36 | 02/Jan/13 18:42 | 3.4.5 | c client | 0 | 1 | Windows C MT client | The windows C MultiThreaded client crashes when usng the zoo_multi APis. The underlying is that the mutex and condition variables need to be initialized with pthread_cond_init and pthread_mutex_init. Attached are the files I've modified to make this work. In the modified files I've added a "multi" command to cli that when Cli.exe (mt build) is run on window's without the rest of the fixes will crash. |
302299 | No Perforce job exists for this issue. | 1 | 248966 | 7 years, 12 weeks, 1 day ago | 0|i16zzr: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1613 | The documentation still points to 2008 in the copyright notice |
Bug | Closed | Trivial | Fixed | Edward Ribeiro | Edward Ribeiro | Edward Ribeiro | 30/Dec/12 18:47 | 13/Mar/14 14:16 | 25/Jan/13 02:29 | 3.4.5 | 3.3.7, 3.4.6, 3.5.0 | documentation | 30/Dec/12 | 0 | 3 | While fiddling with docbook to solve the broken links of ZOOKEEPER-1488 I noted that all the documentation's copyright notice still has the year 2008 only. I am submitting a patch a fix this. | newbie | 302080 | No Perforce job exists for this issue. | 1 | 248721 | 6 years, 2 weeks ago |
Reviewed
|
docs | 0|i16yhb: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1612 | Zookeeper unable to recover and start once datadir disk is full and disk space cleared |
Bug | Resolved | Major | Duplicate | Unassigned | suja s | suja s | 27/Dec/12 01:57 | 16/Jan/13 13:37 | 16/Jan/13 13:37 | 3.4.3 | 0 | 3 | Once zookeeper data dir disk becomes full, the process gets shut down. {noformat} 2012-12-14 13:22:26,959 [myid:2] - ERROR [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:ZooKeeperServer@276] - Severe unrecoverable error, exiting java.io.IOException: No space left on device at java.io.FileOutputStream.writeBytes(Native Method) at java.io.FileOutputStream.write(FileOutputStream.java:282) at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) at java.io.BufferedOutputStream.write(BufferedOutputStream.java:109) at java.util.zip.CheckedOutputStream.write(CheckedOutputStream.java:56) at java.io.DataOutputStream.write(DataOutputStream.java:90) at java.io.FilterOutputStream.write(FilterOutputStream.java:80) at org.apache.jute.BinaryOutputArchive.writeBuffer(BinaryOutputArchive.java:119) at org.apache.zookeeper.server.DataNode.serialize(DataNode.java:168) at org.apache.jute.BinaryOutputArchive.writeRecord(BinaryOutputArchive.java:123) at org.apache.zookeeper.server.DataTree.serializeNode(DataTree.java:1115) at org.apache.zookeeper.server.DataTree.serializeNode(DataTree.java:1130) at org.apache.zookeeper.server.DataTree.serializeNode(DataTree.java:1130) at org.apache.zookeeper.server.DataTree.serialize(DataTree.java:1179) at org.apache.zookeeper.server.util.SerializeUtils.serializeSnapshot(SerializeUtils.java:138) at org.apache.zookeeper.server.persistence.FileSnap.serialize(FileSnap.java:213) at org.apache.zookeeper.server.persistence.FileSnap.serialize(FileSnap.java:230) at org.apache.zookeeper.server.persistence.FileTxnSnapLog.save(FileTxnSnapLog.java:242) at org.apache.zookeeper.server.ZooKeeperServer.takeSnapshot(ZooKeeperServer.java:274) at org.apache.zookeeper.server.quorum.Learner.syncWithLeader(Learner.java:407) at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:82) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:759) {noformat} Later disk space is cleared and zk started again. Startup of zk fails as it is not able to read snapshot properly. (Since load from disk failed it is not able to join peers in the quorum and get a snapshot diff) {noformat} 2012-12-14 16:20:31,489 [myid:2] - INFO [main:FileSnap@83] - Reading snapshot ../dataDir/version-2/snapshot.1000000042 2012-12-14 16:20:31,564 [myid:2] - ERROR [main:QuorumPeer@472] - Unable to load database on disk java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:375) at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) at org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.<init>(FileTxnLog.java:504) at org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341) at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:132) at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:436) at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:428) at org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:152) at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:111) at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) 2012-12-14 16:20:31,566 [myid:2] - ERROR [main:QuorumPeerMain@89] - Unexpected exception, exiting abnormally java.lang.RuntimeException: Unable to run quorum server at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:473) at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:428) at org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:152) at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:111) at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) Caused by: java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:375) at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) at org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.<init>(FileTxnLog.java:504) at org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341) at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:132) {noformat} |
301854 | No Perforce job exists for this issue. | 0 | 248477 | 7 years, 10 weeks, 1 day ago | 0|i16wzr: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1611 | cbcfhbf vfgbfb |
Bug | Resolved | Major | Invalid | Unassigned | prabhu sharma | prabhu sharma | 27/Dec/12 01:23 | 27/Dec/12 03:57 | 27/Dec/12 03:57 | 0 | 2 | 301852 | No Perforce job exists for this issue. | 0 | 248474 | 7 years, 13 weeks ago | 0|i16wz3: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1610 | Some classes are using == or != to compare Long/String objects instead of .equals() |
Bug | Closed | Critical | Fixed | Edward Ribeiro | Edward Ribeiro | Edward Ribeiro | 26/Dec/12 12:31 | 13/Mar/14 14:17 | 11/Oct/13 15:19 | 3.4.5, 3.5.0 | 3.4.6, 3.5.0 | java client, quorum | 26/Dec/12 | 0 | 4 | The classes org.apache.zookeeper.client.ZooKeeperSaslClient.java and org.apache.zookeeper.server.quorum.flexible.QuorumHierarchical.java compare Strings and/or Longs using referential equality. Usually, this is not a problem because the Longs are cached and Strings are interned, but I myself had problems with those kind of comparisons in the past because one production JVM didn't reused the objects. |
301818 | No Perforce job exists for this issue. | 2 | 248439 | 6 years, 2 weeks ago |
Reviewed
|
0|i16wrb: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1609 | Improve ZooKeeper performance under mixed workload |
Improvement | Resolved | Major | Duplicate | Unassigned | Thawan Kooburat | Thawan Kooburat | 22/Dec/12 15:35 | 07/Apr/17 18:51 | 07/Apr/17 18:51 | 3.4.3 | server | 1 | 5 | ZOOKEEPER-2024 | ZOOKEEPER-1505 allows 1 write or N reads to pass through the CommitProcessor at any given time. I did performance experiment similar to http://wiki.apache.org/hadoop/ZooKeeper/Performance and found that read throughput drop dramatically when there are write requests. After a bit more investigation, I found that the biggest bottleneck is at the request queue entering the CommitProcessor. When the CommitProcessor see any write request, it will need to block the entire pipeline and wait until matching commit from the leader. This means that all read requests (including ping request) won't be able to go through. The time spent waiting for commit from the leader far exceed the time spent waiting for 1 write to goes through the CommitProcessor. The current plan is to create multiple request queues at the front of the CommitProcessor. Requests are hashed using sessionId and send to one of the queue. Whenever, the CommitProcessor saw a write request on one of the queue it moves on to process read requests. It will have to unblock the write requests in the same order that it sent to the leader, so it may need to maintain a separate list to keep track of that. The correctness is the same as having more learners in the ensemble. Sessions which are hashed onto a different queue is similar to sessions connecting to a different learners in the ensemble. I am hoping that this will improve read throughput and reduce disconnect rate on an ensemble with large number of clients |
301660 | No Perforce job exists for this issue. | 0 | 248211 | 2 years, 49 weeks, 6 days ago | 0|i16vcn: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1608 | Add support for key-value store as optional storage engine |
Improvement | Open | Major | Unresolved | Unassigned | Thawan Kooburat | Thawan Kooburat | 22/Dec/12 00:29 | 24/Jan/13 14:22 | 3.4.3 | server | 1 | 6 | Problem: 1. ZooKeeper need to load the entire dataset into its memory. So the total data size and number of znode are limited by the amount of available memory. 2. We want to minimize ZooKeeper down time, but found that it is bound by snapshot loading and writing time. The bigger the database, the longer it take for the system to recover. The worst case is that if the data size grow too large and initLimit wasn't update accordingly, the quorum won't form after failure. Implementation: (still work in progress) 1. Create a new type of DataTree that supported key-value storage as backing store. Our current candidate backing store is Oracle's Berkeley DB Java Edition 2. There is no need to use snapshot facility for this type of DataTree. Since doing a sync write of lastProcessedZxid into the backing store is the same as taking a snapshot. However, the system still use txnlog as before. The system can be considered as having only a single snapshot. It has to rely on backing store to detect data corruption and recovery. 3. There is no need to do any per-node locking. CommitProcessor (ZOOKEEPER-1505) prevents concurrent read and write to reach the DataTree. The DataTree is also accessed by PrepRequestProcessor (to create ChangeRecord), but I believe that read and write to the same znode cannot happens concurrently. 4. There are 3 types of data which is required to be persisted in backing store: ACLs, znodes and sessions. However, we also store other data reduce oDataTree initialization time or serialization cost such as list of node's children and list of ephemeral node. 5. Each Zookeeper's txn may translate into multiple actions on the DataTree. For example, creating a node may result in AddingZNODE, AddingChildren and AddingEphemeralNode. However, as a long as these operations are idempotent, there is no need to group them into a transaction. So txns can be replayed on DataTree without corrupting the data. This also means that the system don't need key-value store that support transaction semantic. Currently, only operations related to quota break this assumption because it use increment operation. 6. SNAP protocol is supported so the ensemble can be upgraded online. In the future we may add extend SNAP protocol to send raw data file in order to save CPU cost when sending large database. |
301632 | No Perforce job exists for this issue. | 0 | 248175 | 7 years, 9 weeks ago | 0|i16v4n: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1607 | Read-only Observer |
Improvement | Patch Available | Major | Unresolved | Raúl Gutiérrez Segalés | Thawan Kooburat | Thawan Kooburat | 21/Dec/12 23:40 | 14/Dec/19 06:06 | 3.4.3 | 3.7.0 | server | 1 | 8 | ZOOKEEPER-1147 | This feature reused some of the mechanism already provided by ReadOnlyZooKeeper (ZOOKEEPER-704) but implemented in a different way Goal: read-only clients should be able to connect to the observer or continue to read data from the observer event when there is an outage of underling quorum. This means that it is possible for the observer to provide 100% read uptime for read-only local session (ZOOKEEPER-1147) Implementation: The observer don't tear down itself when it lose connection with the leader. It only close the connection associated with non read-only sessions and global sessions. So the client can try other observer if this is a temporal failure. During the outage, the observer switch to read-only mode. All the pending and future write requests get will get NOT_READONLY error. Read-only state transition is sent to all session on that observer. The observer only accepts a new connection from a read-only client. When the observer is able to reconnect to the leader. It sends state transition (CONNECTED_STATE) to all current session. If it is able to synchronize with the leader using DIFF, the steam of txns is sent through the commit processor instead of applying to the DataTree directly to prevent raise condition between in-flight read requests (see ZOOKEEPER-1505). The client will receive watch events correctly and can start issuing write requests. However, if the observer is getting the snapshot. It need to drop all the connection since it cannot fire a watch correctly. |
301631 | No Perforce job exists for this issue. | 1 | 248174 | 5 years, 50 weeks ago | 0|i16v4f: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1606 | intermittent failures in ZkDatabaseCorruptionTest on jenkins |
Bug | Closed | Major | Fixed | lixiaofeng | Patrick D. Hunt | Patrick D. Hunt | 21/Dec/12 17:19 | 13/Mar/14 14:17 | 19/Feb/13 03:19 | 3.4.5, 3.5.0 | 3.4.6, 3.5.0 | tests | 0 | 4 | ZkDatabaseCorruptionTest is failing intermittently on jenkins with: "Error Message: the last server is not the leader" Seeing this on jdk7/openjdk7/solaris - 3 times in the last month. https://builds.apache.org/view/S-Z/view/ZooKeeper/job/ZooKeeper-trunk-openjdk7/2/testReport/junit/org.apache.zookeeper.test/ZkDatabaseCorruptionTest/testCorruption/ |
newbie, test-patch | 301596 | No Perforce job exists for this issue. | 1 | 248138 | 6 years, 2 weeks ago |
Reviewed
|
0|i16uwf: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1605 | Make RMI port configurable |
Improvement | Open | Major | Unresolved | Unassigned | Joey Echeverria | Joey Echeverria | 19/Dec/12 07:40 | 19/Dec/12 13:58 | 3.4.5 | jmx | 2 | 4 | JMX uses two ports, the JMX remote port and the RMI server port. The default JMX agent allows you to configure the JMX remote port, via the com.sun.management.jmxremote.port system property, but the RMI server port is randomized at runtime. It's possible to create custom agent that can set the RMI port to a configurable value: http://olegz.wordpress.com/2009/03/23/jmx-connectivity-through-the-firewall/ Making the RMI port configurable is critical to being able to monitor ZK with JMX through a firewall. |
300444 | No Perforce job exists for this issue. | 0 | 244477 | 7 years, 14 weeks, 1 day ago | 0|i168av: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1604 | remove rpm/deb/... packaging |
Task | Closed | Major | Fixed | Chris Nauroth | Patrick D. Hunt | Patrick D. Hunt | 16/Dec/12 12:53 | 21/Jul/16 16:18 | 03/Mar/16 11:14 | 3.3.0 | 3.5.2, 3.6.0 | build | 0 | 12 | ZOOKEEPER-2007, ZOOKEEPER-2124, ZOOKEEPER-2275, ZOOKEEPER-1707, ZOOKEEPER-1708, ZOOKEEPER-1743, ZOOKEEPER-2065, ZOOKEEPER-2095, ZOOKEEPER-2061 | Remove rpm/deb/... packaging from our source repo. Now that BigTop is available and fully supporting ZK it's no longer necessary for us to attempt to include this. | 298954 | No Perforce job exists for this issue. | 2 | 242719 | 4 years, 1 week, 4 days ago |
Reviewed
|
0|i15xg7: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1603 | StaticHostProviderTest testUpdateClientMigrateOrNot hangs |
Bug | Closed | Blocker | Fixed | Flavio Paiva Junqueira | Patrick D. Hunt | Patrick D. Hunt | 16/Dec/12 03:27 | 13/Mar/14 14:17 | 26/Sep/13 17:16 | 3.5.0 | 3.4.6, 3.5.0 | tests | 0 | 5 | StaticHostProviderTest method testUpdateClientMigrateOrNot hangs forever. On my laptop getHostName for 10.10.10.* takes 5+ seconds per call. As a result this method effectively runs forever. Every time I run this test it hangs. Consistent. |
298888 | No Perforce job exists for this issue. | 4 | 242560 | 6 years, 2 weeks ago |
Reviewed
|
0|i15wgv: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1602 | a change to QuorumPeerConfig's API broke compatibility with HBase |
Bug | Resolved | Blocker | Fixed | Alexander Shraer | Patrick D. Hunt | Patrick D. Hunt | 14/Dec/12 19:40 | 16/Dec/12 06:04 | 16/Dec/12 01:44 | 3.5.0 | 3.5.0 | server | 0 | 3 | The following patch broke an API that's in use by HBase. Otherwise current trunk compiles fine when used by hbase: bq. ZOOKEEPER-1411. Consolidate membership management, distinguish between static and dynamic configuration parameters (Alex Shraer via breed) Considering it a blocker even though it's not really a "public" API. If possible we should add back "getServers" method on QuorumPeerConfig to reduce friction for the hbase team. |
newbie | 298438 | No Perforce job exists for this issue. | 1 | 241856 | 7 years, 14 weeks, 4 days ago |
Reviewed
|
0|i15s4f: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1601 | document changes for multi-threaded CommitProcessor and NIOServerCnxn |
Improvement | Resolved | Major | Fixed | Thawan Kooburat | Patrick D. Hunt | Patrick D. Hunt | 12/Dec/12 02:08 | 25/Jan/13 15:24 | 25/Jan/13 02:37 | 3.5.0 | 3.5.0 | documentation | 0 | 1 | ZOOKEEPER-1504 and ZOOKEEPER-1505 introduce changes that should be documented - such as new configuration parameters/defaults, etc... We should also verify that nothing else needs to be changed in the documentation related to these changes. | 297202 | No Perforce job exists for this issue. | 2 | 235196 | 7 years, 8 weeks, 6 days ago |
Reviewed
|
0|i14n13: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1600 | Ephemeral node not getting deleted |
Bug | Resolved | Major | Not A Problem | Patrick D. Hunt | Deepa Muthunoori | Deepa Muthunoori | 10/Dec/12 01:43 | 15/Feb/13 20:21 | 15/Feb/13 20:21 | 0 | 2 | Closing of session is not deleting all the ephemeral nodes. (Eg: From the log, session Id:0x23b6ad21d160000 creates two ephemerals(/CONFIGNODE/NP2147483647 and /ACTIVE/192.168.11.94) but when the session expires, only /CONFIGNODE/NP2147483647 is getting deleted) |
296695 | No Perforce job exists for this issue. | 1 | 233943 | 7 years, 15 weeks, 1 day ago | 0|i14fb3: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1599 | 3.3 server cannot join 3.4 quorum |
Bug | Closed | Blocker | Not A Problem | Skye Wanderman-Milne | Skye Wanderman-Milne | Skye Wanderman-Milne | 07/Dec/12 13:54 | 13/Mar/14 14:17 | 17/Sep/13 18:30 | 3.3.6, 3.4.5 | 3.4.6 | quorum | 0 | 7 | When a 3.3 server attempts to join an existing quorum lead by a 3.4 server, the 3.3 server is disconnected while trying to download the leader's snapshot. The 3.3 server restarts and starts the process over again, but is never able to join the quorum. 3.3 server log: {code} 2012-12-07 10:44:34,582 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2183:Learner@294] - Getting a snapshot from leader 2012-12-07 10:44:34,582 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2183:Learner@325] - Setting leader epoch 12 2012-12-07 10:44:54,604 - WARN [QuorumPeer:/0:0:0:0:0:0:0:0:2183:Follower@82] - Exception when following the leader java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:392) at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:84) at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108) at org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:148) at org.apache.zookeeper.server.quorum.Learner.syncWithLeader(Learner.java:332) at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:75) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:645) 2012-12-07 10:44:54,605 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2183:Follower@165] - shutdown called java.lang.Exception: shutdown Follower at org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:165) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:649) {code} 3.4 leader log: {code} 2012-12-07 10:51:35,178 [myid:2] - INFO [WorkerReceiver[myid=2]:FastLeaderElection$Messenger$WorkerReceiver@273] - Backward compatibility mode, server id=3 2012-12-07 10:51:35,178 [myid:2] - INFO [WorkerReceiver[myid=2]:FastLeaderElection@542] - Notification: 3 (n.leader), 0x1100000000 (n.zxid), 0x2 (n.round), LOOKING (n.state), 3 (n.sid), 0x11 (n.peerEPoch), LEADING (my state) 2012-12-07 10:51:35,182 [myid:2] - INFO [LearnerHandler-/127.0.0.1:37654:LearnerHandler@263] - Follower sid: 3 : info : org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@262f4873 2012-12-07 10:51:35,182 [myid:2] - INFO [LearnerHandler-/127.0.0.1:37654:LearnerHandler@318] - Synchronizing with Follower sid: 3 maxCommittedLog=0x0 minCommittedLog=0x0 peerLastZxid=0x1100000000 2012-12-07 10:51:35,182 [myid:2] - INFO [LearnerHandler-/127.0.0.1:37654:LearnerHandler@395] - Sending SNAP 2012-12-07 10:51:35,183 [myid:2] - INFO [LearnerHandler-/127.0.0.1:37654:LearnerHandler@419] - Sending snapshot last zxid of peer is 0x1100000000 zxid of leader is 0x1200000000sent zxid of db as 0x1200000000 2012-12-07 10:51:55,204 [myid:2] - ERROR [LearnerHandler-/127.0.0.1:37654:LearnerHandler@562] - Unexpected exception causing shutdown while sock still open java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:150) at java.net.SocketInputStream.read(SocketInputStream.java:121) at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) at java.io.BufferedInputStream.read(BufferedInputStream.java:254) at java.io.DataInputStream.readInt(DataInputStream.java:387) at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83) at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108) at org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:450) 2012-12-07 10:51:55,205 [myid:2] - WARN [LearnerHandler-/127.0.0.1:37654:LearnerHandler@575] - ******* GOODBYE /127.0.0.1:37654 ******** {code} |
296538 | No Perforce job exists for this issue. | 1 | 233090 | 6 years, 2 weeks ago | During a rolling upgrade from the 3.3 branch to the 3.4 branch, a 3.3 server won't be able to follow a 3.4, so if there is an election during the upgrade and the new leader is a 3.4 server, then the 3.3 server will be unavailable until it is upgraded. If a 3.3 server leads during the upgrade process and it is the last one to be upgraded, then no problem should be observed. | 0|i14a1j: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1598 | Ability to support more digits in the version string |
Improvement | Closed | Major | Fixed | Raja Aluri | Raja Aluri | Raja Aluri | 07/Dec/12 13:24 | 13/Mar/14 14:16 | 12/Dec/12 02:21 | 3.4.6, 3.5.0 | build | 0 | 3 | Ability to support more digits in the version string. Zookeeper, now expects the version sting to be of X.Y.Z-# format. With this change, the default behavior is still the same X.Y.Z-# and will not break any existing things. But at the same time, allows people to tag on their own digits to the version strings, so that they can add a patch or two in their own environments and be able to distinguish between apache zookeeper version and locally modified zookeeper version. |
296528 | No Perforce job exists for this issue. | 1 | 233080 | 6 years, 2 weeks ago |
Reviewed
|
0|i149zb: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1597 | Windows build failing |
Bug | Closed | Major | Fixed | Michi Mutsuzaki | Alexander Shraer | Alexander Shraer | 04/Dec/12 03:28 | 13/Mar/14 14:17 | 17/Nov/13 06:42 | 3.5.0 | 3.4.6, 3.5.0 | build, c client | 0 | 6 | Seems to be related to C client changes done for ZK-1355. We're not sure why these build failures happen on Windows. ################################################################################### ########################## LAST 60 LINES OF THE CONSOLE ########################### [...truncated 376 lines...] .\src\zookeeper.c(768): error C2224: left of '.count' must have struct/union type .\src\zookeeper.c(768): error C2065: 'i' : undeclared identifier .\src\zookeeper.c(770): error C2065: 'resolved' : undeclared identifier .\src\zookeeper.c(770): error C2224: left of '.data' must have struct/union type .\src\zookeeper.c(770): error C2065: 'i' : undeclared identifier .\src\zookeeper.c(773): error C2065: 'rc' : undeclared identifier .\src\zookeeper.c(774): error C2065: 'rc' : undeclared identifier .\src\zookeeper.c(780): error C2065: 'rc' : undeclared identifier .\src\zookeeper.c(781): error C2065: 'rc' : undeclared identifier .\src\zookeeper.c(788): error C2143: syntax error : missing ';' before 'type' .\src\zookeeper.c(789): error C2143: syntax error : missing ';' before 'type' .\src\zookeeper.c(792): error C2065: 'num_old' : undeclared identifier .\src\zookeeper.c(792): error C2065: 'num_new' : undeclared identifier .\src\zookeeper.c(794): error C2065: 'found_current' : undeclared identifier .\src\zookeeper.c(797): error C2065: 'num_old' : undeclared identifier .\src\zookeeper.c(797): error C2065: 'num_new' : undeclared identifier .\src\zookeeper.c(814): error C2065: 'found_current' : undeclared identifier .\src\zookeeper.c(819): error C2065: 'num_old' : undeclared identifier .\src\zookeeper.c(819): error C2065: 'num_old' : undeclared identifier .\src\zookeeper.c(819): error C2065: 'num_new' : undeclared identifier .\src\zookeeper.c(819): error C2065: 'num_old' : undeclared identifier .\src\zookeeper.c(819): error C2065: 'num_new' : undeclared identifier .\src\zookeeper.c(819): error C2065: 'num_old' : undeclared identifier .\src\zookeeper.c(825): error C2065: 'resolved' : undeclared identifier .\src\zookeeper.c(825): error C2440: '=' : cannot convert from 'int' to 'addrvec_t' .\src\zookeeper.c(843): error C2065: 'resolved' : undeclared identifier .\src\zookeeper.c(843): error C2224: left of '.data' must have struct/union type .\src\zookeeper.c(845): error C2065: 'resolved' : undeclared identifier .\src\zookeeper.c(848): error C2065: 'hosts' : undeclared identifier .\src\zookeeper.c(849): error C2065: 'hosts' : undeclared identifier .\src\zookeeper.c(850): error C2065: 'hosts' : undeclared identifier .\src\zookeeper.c(853): error C2065: 'rc' : undeclared identifier .\src\zookeeper.c(1177): error C2143: syntax error : missing ';' before 'const' .\src\zookeeper.c(1179): error C2065: 'endpoint_info' : undeclared identifier .\src\zookeeper.c(1883): error C2143: syntax error : missing ';' before 'type' .\src\zookeeper.c(1884): error C2065: 'rc' : undeclared identifier .\src\zookeeper.c(1885): error C2065: 'rc' : undeclared identifier .\src\zookeeper.c(1916): error C2143: syntax error : missing ';' before 'type' .\src\zookeeper.c(1920): error C2143: syntax error : missing ';' before 'type' .\src\zookeeper.c(1927): error C2065: 'ssoresult' : undeclared identifier .\src\zookeeper.c(1927): error C2065: 'enable_tcp_nodelay' : undeclared identifier .\src\zookeeper.c(1927): error C2065: 'enable_tcp_nodelay' : undeclared identifier .\src\zookeeper.c(1928): error C2065: 'ssoresult' : undeclared identifier .\src\zookeeper.c(1944): error C2065: 'rc' : undeclared identifier .\src\zookeeper.c(1949): error C2065: 'rc' : undeclared identifier .\src\zookeeper.c(1962): error C2065: 'rc' : undeclared identifier .\src\zookeeper.c(1963): error C2065: 'rc' : undeclared identifier .\src\zookeeper.c(2004): error C2065: 'rc' : undeclared identifier .\src\zookeeper.c(2004): fatal error C1003: error count exceeds 100; stopping compilation 38 Warning(s) 102 Error(s) |
295913 | No Perforce job exists for this issue. | 4 | 231939 | 6 years, 2 weeks ago |
Reviewed
|
0|i142xz: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1596 | Zab1_0Test should ensure that the file is closed |
Bug | Closed | Major | Fixed | Enis Soztutar | Enis Soztutar | Enis Soztutar | 03/Dec/12 18:34 | 13/Mar/14 14:17 | 11/Dec/12 03:20 | 3.4.5, 3.5.0 | 3.4.6, 3.5.0 | 0 | 4 | Zab1_0Test fails on windows with: {code} java.io.IOException: Could not rename temporary file C:\Users\ADMINI~1\AppData\Local\Temp\2\test6831881113551099349dir\version-2\acceptedEpoch.tmp to C:\Users\A DMINI~1\AppData\Local\Temp\2\test6831881113551099349dir\version-2\acceptedEpoch at org.apache.zookeeper.common.AtomicFileOutputStream.close(AtomicFileOutputStream.java:82) at org.apache.zookeeper.server.quorum.QuorumPeer.writeLongToFile(QuorumPeer.java:1121) at org.apache.zookeeper.server.quorum.QuorumPeer.setAcceptedEpoch(QuorumPeer.java:1148) at org.apache.zookeeper.server.quorum.Learner.registerWithLeader(Learner.java:281) at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:72) at org.apache.zookeeper.server.quorum.Zab1_0Test$1.run(Zab1_0Test.java:450) {code} The file handlers currentEpoch and acceptedEpoch are not closed, so delete fails on windows. |
295822 | No Perforce job exists for this issue. | 1 | 231172 | 6 years, 2 weeks ago |
Reviewed
|
0|i13y7j: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1595 | Sockets should be read until exhausted |
Improvement | Open | Minor | Unresolved | Unassigned | Nikita Vetoshkin | Nikita Vetoshkin | 03/Dec/12 02:46 | 11/Dec/12 16:16 | server | 2 | 2 | Tested on Linux x64 with Oracle JDK6 | {{doIO}} method in {{NIOServerCnxn}} should read (and write too) until {{read}}/{{write}} returns 0. It's a common practice when working with non-blocking sockets. When an underlying system call (multiplexer) signals, that socket is readable, one should {{recv(2)}} all data from kernel buffer until {{recv}} fails with {{EAGAIN}} or {{EWOULDBLOCK}}. Patch does two things (I know it's not a good idea to mix several changes, but I could stand it): * splits {{doIO}} into {{doRead}} and {{doWrite}} * wraps reading with {{while (true)}} It's pretty easy to instrument the code with a counter and print how many loops we performed until the socket was not readable again. I wrote a simple python script (http://pastebin.com/N5ifM330) which creates 6000 nodes with 5k data each, having 20 concurrent create requests in progress through one connnection. With this script and strace attached to JVM I counted epoll_wait syscalls during the test and I got ~9500 before vs ~8000 after. Run time measurement is very rough, but it's around ~19 secs. before vs 17.5 after. |
newbie, performance | 293370 | No Perforce job exists for this issue. | 1 | 167194 | 7 years, 15 weeks, 2 days ago | 0|i0szaf: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1594 | TestReconfig intermittently fails |
Bug | Resolved | Major | Duplicate | Marshall McMullen | Marshall McMullen | Marshall McMullen | 30/Nov/12 00:27 | 05/Nov/16 13:30 | 05/Nov/16 13:30 | 3.5.0 | c client | 0 | 6 | ZOOKEEPER-2152, ZOOKEEPER-1712 | We've seen an intermittent failure in one of the C client tests TestReconfig which was committed as part of ZOOKEEPER-1355. The test that is failing is failing *before* any rebalancing algorithm is invoked. After inspecting this we've concluded it is a failure to properly seed the random number generator properly. This same problem was seen and solved on the Java client side so we just need to do something similar on the C client side. The assertion: Build/trunk/src/c/tests/TestReconfig.cc:571: Assertion: assertion failed [Expression: numClientsPerHost.at(i) >= lowerboundClientsPerServer(numClients, numServers)] [exec] [exec] Failures !!! [exec] [exec] Run: 38 Failure total: 1 Failures: 1 Errors: 0 [exec] [exec] make: *** [run-check] Error 1 [exec] [exec] BUILD FAILED [exec] /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build.xml:1262: The following error occurred while executing this line: [exec] /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build.xml:1272: exec returned: 2 Also this one: From the latest build logs: [exec] Zookeeper_watchers::testChildWatcher2 : elapsed 54 : OK [exec] /home/jenkins/jenkins-slave/workspace/ZooKeeper-trunk/trunk/src/c/tests/TestReconfig.cc:183: Assertion: equality assertion failed [Expected: 1, Actual : 0] [exec] Failures !!! [exec] Run: 67 Failure total: 1 Failures: 1 Errors: 0 [exec] FAIL: zktest-mt [exe |
292905 | No Perforce job exists for this issue. | 0 | 164120 | 3 years, 19 weeks, 5 days ago | 0|i0sgbb: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1593 | Add Debian style /etc/default/zookeeper support to init script |
Improvement | Resolved | Minor | Not A Problem | Unassigned | Dirkjan Bussink | Dirkjan Bussink | 29/Nov/12 05:55 | 10/May/13 19:12 | 10/May/13 19:12 | 3.4.5 | scripts | 0 | 4 | Debian Linux 6.0 | In our configuration we use a different data directory for Zookeeper. The problem is that the current Debian init.d script has the default location hardcoded: ZOOPIDDIR=/var/lib/zookeeper/data ZOOPIDFILE=${ZOOPIDDIR}/zookeeper_server.pid By using the standard Debian practice of allowing for a /etc/default/zookeeper we can redefine these variables to point to the correct location: ZOOPIDDIR=/var/lib/zookeeper/data ZOOPIDFILE=${ZOOPIDDIR}/zookeeper_server.pid [ -r /etc/default/zookeeper ] && . /etc/default/zookeeper This currently can't be done through /usr/libexec/zkEnv.sh, since that is loaded before ZOOPIDDIR and ZOOPIDFILE are set. Any change there would therefore undo the setup made in for example /etc/zookeeper/zookeeper-env.sh. |
292749 | No Perforce job exists for this issue. | 1 | 163395 | 6 years, 45 weeks, 6 days ago | Patch for supporting /etc/default/zookeeper in Debian init script | 0|i0sbuf: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1592 | support deleting a node silently |
Improvement | Open | Major | Unresolved | Unassigned | Jimmy Xiang | Jimmy Xiang | 27/Nov/12 22:52 | 20/Dec/13 15:03 | 0 | 1 | Sometimes, we want to delete a node. But we are not sure if the node exists or not. In this case, we want the delete method succeed instead of throwing a NoNodeException. Although we can have a wrapper method to do it, it should be better to build this in to ZK. | 292504 | No Perforce job exists for this issue. | 1 | 162009 | 7 years, 15 weeks ago | 0|i0s3af: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1591 | Windows build is broken because inttypes.h doesn't exist |
Bug | Resolved | Major | Fixed | Marshall McMullen | Michi Mutsuzaki | Michi Mutsuzaki | 27/Nov/12 18:32 | 01/Dec/12 06:03 | 30/Nov/12 15:44 | 3.5.0 | 3.5.0 | c client | 0 | 4 | Windows | addrvec.h includes inttypes.h, but it is not present in the windows build environment. https://builds.apache.org/job/ZooKeeper-trunk-WinVS2008/596/console f:\hudson\hudson-slave\workspace\zookeeper-trunk-winvs2008\trunk\src\c\src\addrvec.h(22): fatal error C1083: Cannot open include file: 'inttypes.h': No such file or directory |
292480 | No Perforce job exists for this issue. | 1 | 161985 | 7 years, 16 weeks, 5 days ago | 0|i0s353: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1590 | Patch to add zk.updateServerList(newServerList) broke the build |
Bug | Resolved | Blocker | Fixed | Flavio Paiva Junqueira | Flavio Paiva Junqueira | Flavio Paiva Junqueira | 27/Nov/12 15:08 | 28/Nov/12 06:07 | 28/Nov/12 02:18 | 3.5.0 | 3.5.0 | 0 | 3 | Here is the related output of jenkins: {noformat} validate-xdocs: [exec] /home/jenkins/jenkins-slave/workspace/ZooKeeper-trunk/trunk/src/docs/src/documentation/content/xdocs/zookeeperProgrammers.xml:578:5: The element type "para" must be terminated by the matching end-tag "</para>". [exec] [exec] BUILD FAILED [exec] /home/jenkins/tools/forrest/latest/main/targets/validate.xml:135: Could not validate document /home/jenkins/jenkins-slave/workspace/ZooKeeper-trunk/trunk/src/docs/src/documentation/content/xdocs/zookeeperProgrammers.xml [exec] {noformat} |
292435 | No Perforce job exists for this issue. | 1 | 161205 | 7 years, 17 weeks, 1 day ago |
Reviewed
|
0|i0rybr: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1589 | Documentation list has wrong numbering |
Bug | Resolved | Minor | Invalid | Mahadev Konar | Flavio Paiva Junqueira | Flavio Paiva Junqueira | 22/Nov/12 16:07 | 28/Nov/12 01:41 | 28/Nov/12 01:41 | 0 | 1 | Check the version numbers of the documentation links on the project front page: {noformat} Release 3.4.5(stable) Release 3.4.5(current) {noformat} |
259597 | No Perforce job exists for this issue. | 0 | 125186 | 7 years, 17 weeks, 1 day ago | 0|i0lrzr: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1588 | Write Mechanism of Apache Zookeeper and Neociclo Accord |
Test | Resolved | Blocker | Not A Problem | Unassigned | CHANDAN BAGAI | CHANDAN BAGAI | 22/Nov/12 12:19 | 22/Nov/12 12:28 | 22/Nov/12 12:28 | tests | 0 | 2 | 259581 | No Perforce job exists for this issue. | 0 | 125170 | 7 years, 18 weeks ago | 0|i0lrw7: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1587 | Provide simple way to determine IP address of an ephemeral znode's owner |
Improvement | Open | Major | Unresolved | Unassigned | Todd Lipcon | Todd Lipcon | 21/Nov/12 19:31 | 13/Dec/12 03:09 | 3.4.3 | 0 | 3 | ZOOKEEPER-829 | Occasionally I've run into operational cases where an ephemeral znode exists, and is held by some client, but it's not clear which client is the holder. By getting the znode from the shell, one can find the session ID, but as far as I'm aware the only way to reverse that to an IP is by grepping logs, etc. | 259451 | No Perforce job exists for this issue. | 0 | 124740 | 7 years, 15 weeks ago | 0|i0lp8v: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1586 | tarballs for zkfuse don't compile out of tree |
Bug | Patch Available | Major | Unresolved | Raúl Gutiérrez Segalés | Raúl Gutiérrez Segalés | Raúl Gutiérrez Segalés | 19/Nov/12 02:14 | 09/Oct/13 02:41 | 3.5.0 | contrib-zkfuse | 0 | 1 | 258548 | No Perforce job exists for this issue. | 1 | 119767 | 6 years, 24 weeks, 1 day ago | 0|i0kujr: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1585 | make dist for src/c broken in trunk |
Bug | Resolved | Major | Fixed | Raúl Gutiérrez Segalés | Raúl Gutiérrez Segalés | Raúl Gutiérrez Segalés | 19/Nov/12 01:04 | 02/Mar/16 20:34 | 26/Nov/12 20:37 | 3.5.0 | 3.5.0 | c client | 0 | 4 | make dist from trunk is failing because of a wrong reference to src/zookeeper_log.h (which exists in include/). | 258541 | No Perforce job exists for this issue. | 1 | 119744 | 7 years, 17 weeks, 2 days ago | 0|i0kuen: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1584 | Adding mvn-install target for deploying the zookeeper artifacts to .m2 repository. |
Improvement | Closed | Minor | Fixed | Ashish Singh | Ashish Singh | Ashish Singh | 14/Nov/12 16:27 | 13/Mar/14 14:16 | 14/Dec/12 19:46 | 3.4.3 | 3.4.6, 3.5.0 | build | 0 | 3 | mvn install functionality for zookeeper distribution artifacts to .m2 is not present. | 257871 | No Perforce job exists for this issue. | 1 | 118148 | 6 years, 2 weeks ago |
Reviewed
|
0|i0kkk7: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1583 | Document maxClientCnxns in conf/zoo_sample.cfg |
Improvement | Closed | Critical | Fixed | Christopher Tubbs | Christopher Tubbs | Christopher Tubbs | 14/Nov/12 14:20 | 13/Mar/14 14:16 | 13/Dec/12 01:07 | 3.4.4 | 3.4.6, 3.5.0 | documentation | 0 | 4 | 300 | 300 | 0% | It is silly that maxClientCnxns being set to the default, and that default being too low, is the number one issue for users (according to some: https://raw.github.com/strangeloop/strangeloop2012/master/slides/sessions/Ting-BuildingAnImpenetrableZooKeeper.pdf). It seems to me that this can be resolved by an extremely simple documentation change: add a commented-out configuration line in conf/zoo_sample.cfg that shows the default, but more importantly, shows users that the configuration option exists. |
0% | 0% | 300 | 300 | configuration, documentation, example | 257850 | No Perforce job exists for this issue. | 2 | 118126 | 6 years, 2 weeks ago |
Reviewed
|
0|i0kkfb: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1582 | EndOfStreamException: Unable to read additional data from client |
Bug | Resolved | Blocker | Duplicate | Unassigned | Yanming Zhou | Yanming Zhou | 13/Nov/12 04:28 | 21/Nov/18 07:19 | 14/Dec/12 14:28 | 0 | 18 | windows 7 jdk 7 |
1.download zookeeper-3.4.4.tar.gz and unzip 2.rename conf/zoo_sample.cfg to zoo.cfg 3.click zkServer.cmd 4.click zkCli.cmd zkCli can not connect to zkServer,it blocked zkServer console print 2012-11-13 17:28:05,302 [myid:] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@349] - caught end of stream exception EndOfStreamException: Unable to read additional data from client sessionid 0x13af9131eee0000, likely client has closed socket at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:220) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208) at java.lang.Thread.run(Thread.java:722) 2012-11-13 17:28:05,308 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] - Closed socket connection for client /127.0.0.1:54810 which had sessionid 0x13af9131eee0000 |
257346 | No Perforce job exists for this issue. | 0 | 114430 | 1 year, 17 weeks, 1 day ago | 0|i0jxxr: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1581 | change copyright in notice to 2012 |
Bug | Closed | Major | Fixed | Benjamin Reed | Benjamin Reed | Benjamin Reed | 08/Nov/12 09:56 | 13/Mar/14 14:17 | 12/Dec/12 02:00 | 3.3.7, 3.4.6, 3.5.0 | build | 0 | 4 | it's 2012 so the copyright in notice.txt should end with 2012 | 255955 | No Perforce job exists for this issue. | 1 | 90828 | 6 years, 2 weeks ago |
Reviewed
|
0|i0fwa7: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1580 | QuorumPeer.setRunning is not used |
Bug | Resolved | Minor | Fixed | maoling | Flavio Paiva Junqueira | Flavio Paiva Junqueira | 08/Nov/12 06:13 | 30/Jan/18 05:44 | 30/Jan/18 02:07 | 3.5.3, 3.4.11, 3.6.0 | 3.5.4, 3.6.0 | 0 | 5 | setRunning is a public method and a search did not indicate that it is used anywhere, not even in tests. In fact, I believe we should not change "running" freely and we should only do it when calling shutdown. | 255933 | No Perforce job exists for this issue. | 0 | 90791 | 2 years, 7 weeks, 2 days ago | 0|i0fw1z: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1579 | Compile error of UnixOperationSystemMXBean with open JDK |
Bug | Open | Major | Unresolved | Michelle Chen | Michelle Chen | Michelle Chen | 07/Nov/12 23:06 | 31/Oct/13 12:22 | 3.3.4, 3.4.3 | 1 | 7 | 604800 | 604800 | 0% | SOLR-4526 | zookeeper invokes getOpenFileDescriptorCount() function in com.sun.management.UnixOperatingSystemMXBean, which only exists in SUN JDK, and open JDK did not implement this function. [javac] /root/zookeeper-3.3.4/src/java/test/org/apache/zookeeper/test/ClientBase.java:57: package com.sun.management does not exist [javac] import com.sun.management.UnixOperatingSystemMXBean; [javac] ^ [javac] /root/zookeeper-3.3.4/src/java/test/org/apache/zookeeper/test/QuorumBase.java:39: package com.sun.management does not exist [javac] import com.sun.management.UnixOperatingSystemMXBean; [javac] ^ [javac] /root/zookeeper-3.3.4/src/java/test/org/apache/zookeeper/test/ClientTest.java:48: package com.sun.management does not exist [javac] import com.sun.management.UnixOperatingSystemMXBean; [javac] ^ [javac] /root/zookeeper-3.3.4/src/java/test/org/apache/zookeeper/test/QuorumUtil.java:39: package com.sun.management does not exist [javac] import com.sun.management.UnixOperatingSystemMXBean; |
0% | 0% | 604800 | 604800 | patch | 255891 | No Perforce job exists for this issue. | 0 | 90720 | 6 years, 21 weeks ago |
Reviewed
|
0|i0fvm7: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1578 | org.apache.zookeeper.server.quorum.Zab1_0Test failed due to hard code with 33556 port |
Bug | Closed | Major | Fixed | Michelle Chen | Michelle Chen | Michelle Chen | 07/Nov/12 22:55 | 13/Mar/14 14:17 | 17/Dec/12 02:13 | 3.4.3 | 3.4.6, 3.5.0 | 0 | 6 | 86400 | 86400 | 0% | org.apache.zookeeper.server.quorum.Zab1_0Test was failed both with SUN JDK and open JDK. [junit] Running org.apache.zookeeper.server.quorum.Zab1_0Test [junit] Tests run: 8, Failures: 0, Errors: 1, Time elapsed: 18.334 sec [junit] Test org.apache.zookeeper.server.quorum.Zab1_0Test FAILED Zab1_0Test log: Zab1_0Test log: 2012-07-11 23:17:15,579 [myid:] - INFO [main:Leader@427] - Shutdown called java.lang.Exception: shutdown Leader! reason: end of test at org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:427) at org.apache.zookeeper.server.quorum.Zab1_0Test.testLastAcceptedEpoch(Zab1_0Test.java:211) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:48) 2012-07-11 23:17:15,584 [myid:] - ERROR [main:Leader@139] - Couldn't bind to port 33556 java.net.BindException: Address already in use at java.net.PlainSocketImpl.bind(PlainSocketImpl.java:402) at java.net.ServerSocket.bind(ServerSocket.java:328) at java.net.ServerSocket.bind(ServerSocket.java:286) at org.apache.zookeeper.server.quorum.Leader.<init>(Leader.java:137) at org.apache.zookeeper.server.quorum.Zab1_0Test.createLeader(Zab1_0Test.java:810) at org.apache.zookeeper.server.quorum.Zab1_0Test.testLeaderInElectingFollowers(Zab1_0Test.java:224) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 2012-07-11 23:17:20,202 [myid:] - ERROR [LearnerHandler-bdvm039.svl.ibm.com/9.30.122.48:40153:LearnerHandler@559] - Unex pected exception causing shutdown while sock still open java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.read(SocketInputStream.java:129) at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) at java.io.BufferedInputStream.read(BufferedInputStream.java:237) at java.io.DataInputStream.readInt(DataInputStream.java:370) at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83) at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108) at org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:291) 2012-07-11 23:17:20,203 [myid:] - WARN [LearnerHandler-bdvm039.svl.ibm.com/9.30.122.48:40153:LearnerHandler@569] - **** *** GOODBYE bdvm039.svl.ibm.com/9.30.122.48:40153 ******** 2012-07-11 23:17:20,204 [myid:] - INFO [Thread-20:Leader@421] - Shutting down 2012-07-11 23:17:20,204 [myid:] - INFO [Thread-20:Leader@427] - Shutdown called java.lang.Exception: shutdown Leader! reason: lead ended this failure seems 33556 port is already used, but it is not in use with command check in fact. There is a hard code in unit test, we can improve it with code patch. |
0% | 0% | 86400 | 86400 | patch | 255890 | No Perforce job exists for this issue. | 2 | 90719 | 6 years, 2 weeks ago |
Reviewed
|
0|i0fvlz: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1577 | Update website with info on how to report security bugs |
Task | Open | Minor | Unresolved | Unassigned | Eli Collins | Eli Collins | 07/Nov/12 22:46 | 07/Nov/12 22:46 | documentation | 0 | 1 | The website should be updated with information on how to report potential security vulnerabilities. In Hadoop land we have a private security list that anyone case post to that we point to on our list page: Hadoop example http://hadoop.apache.org/general_lists.html#Security. | 255888 | No Perforce job exists for this issue. | 0 | 90716 | 7 years, 20 weeks ago | 0|i0fvlb: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1576 | Zookeeper cluster - failed to connect to cluster if one of the provided IPs causes java.net.UnknownHostException |
Bug | Resolved | Major | Fixed | Edward Ribeiro | Tally Tsabary | Tally Tsabary | 07/Nov/12 06:11 | 19/Dec/18 06:43 | 28/Jun/14 11:52 | 3.5.0 | 3.5.0 | server | 2 | 18 | ZOOKEEPER-1734, YARN-9151, YARN-7550 | Three 3.4.3 zookeeper servers in cluster, linux. | Using a cluster of three 3.4.3 zookeeper servers. All the servers are up, but on the client machine, the firewall is blocking one of the servers. The following exception is happening, and the client is not connected to any of the other cluster members. The exception:Nov 02, 2012 9:54:32 PM com.netflix.curator.framework.imps.CuratorFrameworkImpl logError SEVERE: Background exception was not retry-able or retry gave up java.net.UnknownHostException: scnrmq003.myworkday.com at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method) at java.net.InetAddress$1.lookupAllHostAddr(Unknown Source) at java.net.InetAddress.getAddressesFromNameService(Unknown Source) at java.net.InetAddress.getAllByName0(Unknown Source) at java.net.InetAddress.getAllByName(Unknown Source) at java.net.InetAddress.getAllByName(Unknown Source) at org.apache.zookeeper.client.StaticHostProvider.<init>(StaticHostProvider.java:60) at org.apache.zookeeper.ZooKeeper.<init>(ZooKeeper.java:440) at org.apache.zookeeper.ZooKeeper.<init>(ZooKeeper.java:375) The code at the org.apache.zookeeper.client.StaticHostProvider.<init>(StaticHostProvider.java:60) is : public StaticHostProvider(Collection<InetSocketAddress> serverAddresses) throws UnknownHostException { for (InetSocketAddress address : serverAddresses) { InetAddress resolvedAddresses[] = InetAddress.getAllByName(address .getHostName()); for (InetAddress resolvedAddress : resolvedAddresses) { this.serverAddresses.add(new InetSocketAddress(resolvedAddress .getHostAddress(), address.getPort())); } } ...... The for-loop is not trying to resolve the rest of the servers on the list if there is an UnknownHostException at the InetAddress.getAllByName(address.getHostName()); and it fails the client connection creation. I was expecting the connection will be created for the other members of the cluster. Also, InetAddress is a blocking command, and if it takes very long time, (longer than the defined timeout) - that also should allow us to continue to try and connect to the other servers on the list. Assuming this will be fixed, and we will get connection to the current available servers, I think the zookeeper should continue to retry to connect to the not-connected server of the cluster, so it will be able to use it later when it is back. If one of the servers on the list is not available during the connection creation, then it should be retried every x time despite the fact that we |
255723 | No Perforce job exists for this issue. | 5 | 90498 | 2 years, 33 weeks, 4 days ago | 0|i0fu8v: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1575 | adding .gitattributes to prevent CRLF and LF mismatches for source and text files |
Bug | Resolved | Major | Fixed | Raja Aluri | Raja Aluri | Raja Aluri | 06/Nov/12 20:31 | 04/Apr/14 07:12 | 03/Apr/14 21:36 | 3.4.7, 3.5.0 | 0 | 4 | adding .gitattributes to prevent CRLF and LF mismatches for source and text files | 255613 | No Perforce job exists for this issue. | 1 | 90264 | 5 years, 50 weeks, 6 days ago | 0|i0fssv: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1574 | mismatched CR/LF endings in text files |
Improvement | Resolved | Minor | Fixed | Raja Aluri | Raja Aluri | Raja Aluri | 06/Nov/12 20:21 | 09/Apr/14 23:22 | 03/Apr/14 21:31 | 3.4.6, 3.5.0 | 3.4.7, 3.5.0 | 0 | 5 | Source code in zookeeper repo has a bunch of files that have CRLF endings. With more development happening on windows there is a higher chance of more CRLF files getting into the source tree. I would like to avoid that by creating .gitattributes file which prevents sources from having CRLF entries in text files. But before adding the .gitattributes file we need to normalize the existing tree, so that people when they sync after .giattributes change wont end up with a bunch of modified files in their workspace. I am adding a couple of links here to give more primer on what exactly is the issue and how we are trying to fix it. [http://git-scm.com/docs/gitattributes#_checking_out_and_checking_in] [http://stackoverflow.com/questions/170961/whats-the-best-crlf-handling-strategy-with-git] I will submit a separate bug and patch for .gitattributes |
255612 | No Perforce job exists for this issue. | 3 | 90263 | 5 years, 50 weeks ago | 0|i0fssn: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1573 | Unable to load database due to missing parent node |
Bug | Closed | Critical | Fixed | Vinayakumar B | Thawan Kooburat | Thawan Kooburat | 01/Nov/12 19:13 | 13/Mar/14 14:17 | 10/Feb/14 15:53 | 3.4.3, 3.5.0 | 3.4.6, 3.5.0 | server | 0 | 12 | ZOOKEEPER-1813, ZOOKEEPER-1879 | While replaying txnlog on data tree, the server has a code to detect missing parent node. This code block was last modified as part of ZOOKEEPER-1333. In our production, we found a case where this check is return false positive. The sequence of txns is as follows: zxid 1: create /prefix/a zxid 2: create /prefix/a/b zxid 3: delete /prefix/a/b zxid 4: delete /prefix/a The server start capturing snapshot at zxid 1. However, by the time it traversing the data tree down to /prefix, txn 4 is already applied and /prefix have no children. When the server restore from snapshot, it process txnlog starting from zxid 2. This txn generate missing parent error and the server refuse to start up. The same check allow me to discover bug in ZOOKEEPER-1551, but I don't know if we have any option beside removing this check to solve this issue. |
253905 | No Perforce job exists for this issue. | 5 | 81579 | 6 years, 2 weeks ago |
Reviewed
|
0|i0eb7r: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1572 | Add an async interface for multi request |
Improvement | Resolved | Major | Fixed | Sijie Guo | Sijie Guo | Sijie Guo | 01/Nov/12 03:57 | 30/Jul/15 03:51 | 03/Feb/13 10:36 | 3.4.5 | 3.5.0 | java client | 0 | 9 | ZOOKEEPER-2237 | ZOOKEEPER-1066 | Currently there is no async interface for multi request in ZooKeeper java client. | review | 253561 | No Perforce job exists for this issue. | 3 | 79004 | 4 years, 34 weeks ago | 0|i0dvbr: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1571 | Allow QuorumUtil.java build with IBM Java |
Improvement | Resolved | Major | Duplicate | Unassigned | Paulo Ricardo Paz Vital | Paulo Ricardo Paz Vital | 30/Oct/12 11:22 | 01/May/13 22:29 | 28/Nov/12 06:21 | 3.4.4 | 3.4.4 | tests | 0 | 1 | ZOOKEEPER-1474, ZOOKEEPER-1564 | Linux (x86_64), RHEL 6.3, IBM Java 6 SR 11 | The org.apache.zookeeper.test.QuorumUtil class imports the com.sun.management.UnixOperatingSystemMXBean class, that fail to build when using IBM Java 6 SR 11. This issue is resolved by new class OSMXBean class proposed in JIRA's 1474. The class OSMXBean (org.apache.zookeeper.server.util.OSMXBean) is a wrapper for the implementation of com.sun.management.UnixOperatingSystemMXBean, and decides to use the SUN API or its own implementation depending on the runtime (vendor) used. |
test | 253160 | No Perforce job exists for this issue. | 1 | 75887 | 7 years, 17 weeks, 1 day ago | 0|i0dc33: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1570 | Allow QuorumBase.java build with IBM Java |
Improvement | Resolved | Major | Duplicate | Unassigned | Paulo Ricardo Paz Vital | Paulo Ricardo Paz Vital | 30/Oct/12 11:20 | 01/May/13 22:29 | 28/Nov/12 06:20 | 3.4.4 | 3.4.4 | tests | 0 | 1 | ZOOKEEPER-1474, ZOOKEEPER-1564 | Linux, RHEL 6.3, IBM Java 6 SR 11 | The org.apache.zookeeper.test.QuorumBase class imports the com.sun.management.UnixOperatingSystemMXBean class, that fail to build when using IBM Java 6 SR 11. This issue is resolved by new class OSMXBean class proposed in JIRA's 1474. The class OSMXBean (org.apache.zookeeper.server.util.OSMXBean) is a wrapper for the implementation of com.sun.management.UnixOperatingSystemMXBean, and decides to use the SUN API or its own implementation depending on the runtime (vendor) used. |
test | 253159 | No Perforce job exists for this issue. | 1 | 75886 | 7 years, 17 weeks, 1 day ago | 0|i0dc2v: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1569 | support upsert: setData if the node exists, otherwise, create a new node |
Improvement | Open | Major | Unresolved | Unassigned | Jimmy Xiang | Jimmy Xiang | 23/Oct/12 13:47 | 20/Dec/13 15:03 | 1 | 3 | HBASE-7022 | Currently, ZooKeeper supports setData and create. If it can support upsert like in SQL, it will be great. | 250604 | No Perforce job exists for this issue. | 3 | 62043 | 7 years, 14 weeks, 3 days ago | 0|i0azl3: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1568 | multi should have a non-transaction version |
Improvement | Open | Major | Unresolved | Unassigned | Jimmy Xiang | Jimmy Xiang | 23/Oct/12 13:43 | 20/Dec/13 15:04 | 0 | 4 | HBASE-7022 | Currently multi is transactional, i.e. all or none. However, sometimes, we don't want that. We want all operations to be executed. Even some operation(s) fails, it is ok. We just need to know the result of each operation. | 250603 | No Perforce job exists for this issue. | 2 | 62042 | 7 years, 15 weeks ago | 0|i0azkv: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1567 | JMX can't be disabled with zkEnv.sh |
Bug | Patch Available | Major | Unresolved | Jakub Lekstan | Jakub Lekstan | Jakub Lekstan | 17/Oct/12 14:34 | 17/Oct/12 15:09 | 3.4.4 | scripts | 0 | 1 | zkServer.sh looks for JMX variables before "including" zkEnv.sh, this way you can not disable JMX with scripts which zkEnv.sh "includes". Patch included. |
249356 | No Perforce job exists for this issue. | 1 | 57423 | 7 years, 23 weeks, 1 day ago | 0|i0a73j: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1566 | progress quits duo to zxid not in order |
Bug | Open | Major | Unresolved | Unassigned | Zhou wenjian | Zhou wenjian | 17/Oct/12 06:09 | 17/Oct/12 06:11 | 0 | 1 | 2012-10-17 15:04:28,006 - WARN [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Follower@116] - Got 0x3800000002 expected 0x3800000001 2012-10-17 15:04:28,007 - WARN [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Follower@116] - Got zxid 0x3800000001 expected 0x3800000003 2012-10-17 15:04:28,007 - WARN [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Follower@116] - Got zxid 0x3800000003 expected 0x3800000002 2012-10-17 15:04:28,009 - FATAL [QuorumPeer:/0:0:0:0:0:0:0:0:2181:FollowerZooKeeperServer@112] - Committing zxid 0x3800000003 but next pending txn 0x3800000001 |
249252 | No Perforce job exists for this issue. | 0 | 57307 | 7 years, 23 weeks, 1 day ago | 0|i0a6dr: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1565 | Allow ClientTest.java build with IBM Java |
Improvement | Resolved | Major | Duplicate | Unassigned | Paulo Ricardo Paz Vital | Paulo Ricardo Paz Vital | 16/Oct/12 15:56 | 01/May/13 22:29 | 28/Nov/12 06:18 | 3.4.4 | 3.4.4 | tests | 0 | 1 | ZOOKEEPER-1474, ZOOKEEPER-1564 | Linux, RHEL 6.3, IBM Java 6 SR 11 | The org.apache.zookeeper.test.ClientTest class imports the com.sun.management.UnixOperatingSystemMXBean class, that fail to build when using IBM Java 6 SR 11. This issue is resolved by new class OSMXBean class proposed in JIRA's 1474. The class OSMXBean (org.apache.zookeeper.server.util.OSMXBean) is a wrapper for the implementation of com.sun.management.UnixOperatingSystemMXBean, and decides to use the SUN API or its own implementation depending on the runtime (vendor) used. |
test | 249113 | No Perforce job exists for this issue. | 1 | 57059 | 7 years, 17 weeks, 1 day ago | 0|i0a4un: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1564 | Allow JUnit test build with IBM Java |
Improvement | Closed | Major | Fixed | Paulo Ricardo Paz Vital | Paulo Ricardo Paz Vital | Paulo Ricardo Paz Vital | 15/Oct/12 15:32 | 13/Mar/14 14:17 | 11/Dec/12 02:46 | 3.4.4, 3.4.5, 3.5.0 | 3.4.6, 3.5.0 | tests | 0 | 3 | ZOOKEEPER-1474, ZOOKEEPER-1565, ZOOKEEPER-1570, ZOOKEEPER-1571 | Linux, RHEL 6.3, IBM Java 6 SR 11 | The org.apache.zookeeper.test.ClientBase, org.apache.zookeeper.test.ClientTest, org.apache.zookeeper.test.QuorumBase and org.apache.zookeeper.test.QuorumUtil classes import the com.sun.management.UnixOperatingSystemMXBean class, that fail to build when using IBM Java 6 SR 11. This issue is resolved by new class OSMXBean class proposed in JIRA's ZOOKEEPER-1474. The class OSMXBean (org.apache.zookeeper.server.util.OSMXBean) is a wrapper for the implementation of com.sun.management.UnixOperatingSystemMXBean, and decides to use the SUN API or its own implementation depending on the runtime (vendor) used. |
test | 248796 | No Perforce job exists for this issue. | 3 | 56277 | 6 years, 2 weeks ago | 0|i0a00v: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1563 | Wrong solution - unable to build under Windows with Visual Studio |
Bug | Resolved | Major | Fixed | Unassigned | Jakub Lekstan | Jakub Lekstan | 13/Oct/12 10:55 | 28/Dec/12 07:35 | 28/Dec/12 07:35 | 3.4.4 | c client | 0 | 3 | Windows 7 x64 Visual Studio C++ 2010 Express |
When I try to open zookeeper.sln the VS wants me to convert the project. While the convertion is taking place I'm getting a message: "A file with the name: "[path]\zookeeper.vcxproj" already exists on disk. Do you want to overwrite the project and its imported property sheets" And after it I get next message with same text but it is about Cli.vxproj No matter If I click yes or no the coverting process fails, both projects (Cli and zookeeper) are marked as unavailable. If I close VS and open the zookeeper.sln once again it wants me to convert but now if I answer yes the projects are again unavailable but if I answer no the projects are available but are empty. |
248459 | No Perforce job exists for this issue. | 0 | 55508 | 7 years, 12 weeks, 6 days ago | visual studio | 0|i09v9z: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1562 | Memory leaks in zoo_multi API |
Bug | Closed | Trivial | Fixed | Deepak Jagtap | Deepak Jagtap | Deepak Jagtap | 12/Oct/12 21:03 | 13/Mar/14 14:16 | 03/Feb/13 01:42 | 3.4.3, 3.4.4 | 3.4.6, 3.5.0 | c client | 0 | 6 | Zookeeper client and server both are running on CentOS 6.3 | Valgrind is reporting memory leak for zoo_multi operations. ==4056== 2,240 (160 direct, 2,080 indirect) bytes in 1 blocks are definitely lost in loss record 18 of 24 ==4056== at 0x4A04A28: calloc (vg_replace_malloc.c:467) ==4056== by 0x504D822: create_completion_entry (zookeeper.c:2322) ==4056== by 0x5052833: zoo_amulti (zookeeper.c:3141) ==4056== by 0x5052A8B: zoo_multi (zookeeper.c:3240) It looks like completion entries for individual operations in multiupdate transaction are not getting freed. My observation is that memory leak size depends on the number of operations in single mutlipupdate transaction |
patch | 248154 | No Perforce job exists for this issue. | 1 | 53975 | 6 years, 2 weeks ago | zoo_multi API used to leak memory while deserializing the response from zookeeper server. Completion entries for individual operation in zoo_multi transaction weren't getting cleaned causing memory leak. This patch resolves this memory leak by destroying completion entries in deserialize_multi function. |
Reviewed
|
zoo_multi memory-leak | 0|i09lv3: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1561 | Zookeeper client may hang on a server restart |
Bug | Resolved | Major | Duplicate | Unassigned | Jacky007 | Jacky007 | 11/Oct/12 03:43 | 23/Dec/12 23:18 | 23/Dec/12 22:54 | 3.5.0 | 3.5.0 | java client | 1 | 3 | ZOOKEEPER-1437, ZOOKEEPER-1560, ZOOKEEPER-107 | In the doIO method of ClientCnxnSocketNIO {noformat} if (p != null) { outgoingQueue.removeFirstOccurrence(p); updateLastSend(); if ((p.requestHeader != null) && (p.requestHeader.getType() != OpCode.ping) && (p.requestHeader.getType() != OpCode.auth)) { p.requestHeader.setXid(cnxn.getXid()); } p.createBB(); ByteBuffer pbb = p.bb; sock.write(pbb); if (!pbb.hasRemaining()) { sentCount++; if (p.requestHeader != null && p.requestHeader.getType() != OpCode.ping && p.requestHeader.getType() != OpCode.auth) { pending.add(p); } } {noformat} When the sock.write(pbb) method throws an exception, the packet will not be cleanup(not in outgoingQueue nor in pendingQueue). If the client wait for it, it will wait forever... |
247277 | No Perforce job exists for this issue. | 0 | 46150 | 7 years, 13 weeks, 3 days ago | It is fixed in ZOOKEEPER-1560. | 0|i089kn: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1560 | Zookeeper client hangs on creation of large nodes |
Bug | Resolved | Major | Fixed | Skye Wanderman-Milne | Igor Motov | Igor Motov | 10/Oct/12 19:45 | 31/Oct/12 19:00 | 31/Oct/12 14:44 | 3.4.4, 3.5.0 | 3.4.5, 3.5.0 | java client | 0 | 12 | ZOOKEEPER-1561 | To reproduce, try creating a node with 0.5M of data using java client. The test will hang waiting for a response from the server. See the attached patch for the test that reproduces the issue. It seems that ZOOKEEPER-1437 introduced a few issues to {{ClientCnxnSocketNIO.doIO}} that prevent {{ClientCnxnSocketNIO}} from sending large packets that require several invocations of {{SocketChannel.write}} to complete. The first issue is that the call to {{outgoingQueue.removeFirstOccurrence(p);}} removes the packet from the queue even if the packet wasn't completely sent yet. It looks to me that this call should be moved under {{if (!pbb.hasRemaining())}} The second issue is that {{p.createBB()}} is reinitializing {{ByteBuffer}} on every iteration, which confuses {{SocketChannel.write}}. And the third issue is caused by extra calls to {{cnxn.getXid()}} that increment xid on every iteration and confuse the server. |
247171 | No Perforce job exists for this issue. | 11 | 45345 | 7 years, 21 weeks, 1 day ago |
Reviewed
|
0|i084lr: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1559 | ZOOKEEPER-1549 Learner should not snapshot uncommitted state |
Sub-task | Open | Major | Unresolved | Hongchao Deng | Flavio Paiva Junqueira | Flavio Paiva Junqueira | 06/Oct/12 09:46 | 02/Dec/14 21:50 | quorum | 0 | 5 | The code in Learner.java is a bit entangled for backward compatibility reasons. We need to make sure that we can remove the calls to take a snapshot without breaking it. | 244712 | No Perforce job exists for this issue. | 0 | 31349 | 5 years, 16 weeks, 1 day ago | 0|i05q8v: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1558 | ZOOKEEPER-1549 Leader should not snapshot uncommitted state |
Sub-task | Closed | Blocker | Fixed | Flavio Paiva Junqueira | Flavio Paiva Junqueira | Flavio Paiva Junqueira | 06/Oct/12 09:45 | 13/Mar/14 14:17 | 19/Oct/13 06:06 | 3.4.6 | 3.4.6 | quorum | 0 | 5 | Leader currently takes a snapshot when it calls loadData in the beginning of the lead() method. The loaded data, however, may contain uncommitted state. | 244711 | No Perforce job exists for this issue. | 8 | 31348 | 6 years, 2 weeks ago | 0|i05q8n: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1557 | jenkins jdk7 test failure in testBadSaslAuthNotifiesWatch |
Bug | Closed | Major | Fixed | Eugene Joseph Koontz | Patrick D. Hunt | Patrick D. Hunt | 04/Oct/12 19:03 | 13/Mar/14 14:17 | 23/Oct/13 21:10 | 3.4.5, 3.5.0 | 3.4.6, 3.5.0 | server, tests | 0 | 6 | ZOOKEEPER-1550, ZOOKEEPER-1648 | Failure of testBadSaslAuthNotifiesWatch on the jenkins jdk7 job: https://builds.apache.org/job/ZooKeeper-trunk-jdk7/407/ haven't seen this before. |
241704 | No Perforce job exists for this issue. | 4 | 11414 | 6 years, 2 weeks ago | Committed to 3.4.6/trunk. Thanks Eugene. | 0|i02b6v: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1556 | Memory leak reported by valgrind mt version |
Bug | Open | Minor | Unresolved | Unassigned | André Martin | André Martin | 03/Oct/12 15:10 | 24/Oct/17 00:39 | 3.4.4 | c client | 0 | 5 | ZOOKEEPER-2015, ZOOKEEPER-1632 | Valgrind reports the following memory leak when using the c-client (mt): ==11674== 18 bytes in 9 blocks are indirectly lost in loss record 14 of 173 ==11674== at 0x4C2B6CD: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==11674== by 0xC8064A: ia_deserialize_string (recordio.c:271) ==11674== by 0xC81F2E: deserialize_String_vector (zookeeper.jute.c:247) ==11674== by 0xC842F9: deserialize_GetChildrenResponse (zookeeper.jute.c:874) ==11674== by 0xC7E9F0: zookeeper_process (zookeeper.c:1904) ==11674== by 0xC7FE5B: do_io (mt_adaptor.c:439) ==11674== by 0x4E39E99: start_thread (pthread_create.c:308) ==11674== by 0x5FA6DBC: clone (clone.S:112) ==11674== ==11674== 90 (72 direct, 18 indirect) bytes in 49 blocks are definitely lost in loss record 139 of 173 ==11674== at 0x4C29DB4: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==11674== by 0xC81EEE: deserialize_String_vector (zookeeper.jute.c:245) ==11674== by 0xC842F9: deserialize_GetChildrenResponse (zookeeper.jute.c:874) ==11674== by 0xC7E9F0: zookeeper_process (zookeeper.c:1904) ==11674== by 0xC7FE5B: do_io (mt_adaptor.c:439) ==11674== by 0x4E39E99: start_thread (pthread_create.c:308) ==11674== by 0x5FA6DBC: clone (clone.S:112) |
242157 | No Perforce job exists for this issue. | 0 | 12704 | 2 years, 21 weeks, 2 days ago | 0|i02j5j: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1555 | ACLs are not respected for node deletion |
Bug | Resolved | Critical | Not A Problem | Unassigned | Guillaume Nodet | Guillaume Nodet | 03/Oct/12 11:59 | 03/Oct/12 12:08 | 03/Oct/12 12:08 | 3.4.3 | 0 | 1 | Any session can delete nodes with restricted ACLs. | 242158 | No Perforce job exists for this issue. | 0 | 12705 | 7 years, 25 weeks, 1 day ago | 0|i02j5r: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1554 | Can't use zookeeper client without SASL |
Bug | Closed | Blocker | Fixed | Unassigned | Guillaume Nodet | Guillaume Nodet | 03/Oct/12 11:35 | 13/Mar/14 14:17 | 30/Oct/13 00:21 | 3.4.4 | 3.4.6, 3.5.0 | 3 | 10 | ZOOKEEPER-1550, ZOOKEEPER-1696 | The ZooKeeperSaslClient correctly detects that it should not use SASL when nothing is configured, however the SendThread waits forever because clientTunneledAuthenticationInProgress() returns true instead of false. | 242159 | No Perforce job exists for this issue. | 0 | 12706 | 6 years, 2 weeks ago | 0|i02j5z: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1553 | Findbugs configuration is missing some dependencies |
Bug | Closed | Minor | Fixed | Sean Busbey | Sean Busbey | Sean Busbey | 01/Oct/12 18:00 | 13/Mar/14 14:16 | 12/Dec/12 03:03 | 3.5.0 | 3.4.6, 3.5.0 | build | 0 | 4 | While updating the findbugs configuration to account for a change in log4j versions I noticed findbugs complaining about access to the netty and slf4j classes. Steps to reproduce: # install findbugs to $FINDBUGS_HOME # run ant -Dfindbugs.home="$FINDBUGS_HOME" findbugs |
239567 | No Perforce job exists for this issue. | 1 | 2351 | 6 years, 2 weeks ago |
Reviewed
|
0|i00r9z: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1552 | Enable sync request processor in Observer |
Improvement | Closed | Major | Fixed | Flavio Paiva Junqueira | Thawan Kooburat | Thawan Kooburat | 30/Sep/12 21:28 | 13/Mar/14 14:17 | 30/Sep/13 16:55 | 3.4.3 | 3.4.6, 3.5.0 | quorum, server | 0 | 8 | ZOOKEEPER-1551, ZOOKEEPER-1462, ZOOKEEPER-1758 | Observer doesn't forward its txns to SyncRequestProcessor. So it never persists the txns onto disk or periodically creates snapshots. This increases the start-up time since it will get the entire snapshot if the observer has be running for a long time. |
239578 | No Perforce job exists for this issue. | 9 | 2366 | 6 years, 2 weeks ago | 0|i00rdb: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1551 | Observers ignore txns that come after snapshot and UPTODATE |
Bug | Closed | Blocker | Fixed | Thawan Kooburat | Thawan Kooburat | Thawan Kooburat | 30/Sep/12 20:57 | 13/Mar/14 14:17 | 08/Oct/13 12:34 | 3.4.3 | 3.4.6, 3.5.0 | quorum, server | 2 | 8 | ZOOKEEPER-1552 | In Learner.java, txns which comes after the learner has taken the snapshot (after NEWLEADER packet) are stored in packetsNotCommitted. The follower has special logic to apply these txns at the end of syncWithLeader() method. However, the observer will ignore these txns completely, causing data inconsistency. | 239554 | No Perforce job exists for this issue. | 7 | 2333 | 6 years, 2 weeks ago | 0|i00r5z: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1550 | ZooKeeperSaslClient does not finish anonymous login on OpenJDK |
Bug | Resolved | Blocker | Fixed | Eugene Joseph Koontz | Robert Macomber | Robert Macomber | 26/Sep/12 12:32 | 16/Jan/13 14:00 | 28/Sep/12 13:06 | 3.4.4 | 3.4.5 | java client | 0 | 6 | ZOOKEEPER-1623, ZOOKEEPER-1477, ZOOKEEPER-1554, ZOOKEEPER-1557 | On OpenJDK, {{javax.security.auth.login.Configuration.getConfiguration}} does not throw an exception. {{ZooKeeperSaslClient.clientTunneledAuthenticationInProgress}} uses an exception from that method as a proxy for "this client is not configured to use SASL" and as a result no commands can be sent, since it is still waiting for auth to complete. [Link to mailing list discussion|http://comments.gmane.org/gmane.comp.java.zookeeper.user/2667] The relevant bit of logs from OpenJDK and Oracle versions of 'connect and do getChildren("/")': {code:title=OpenJDK} INFO [main] 2012-09-25 14:02:24,545 com.socrata.Main Waiting for connection... DEBUG [main] 2012-09-25 14:02:24,548 com.socrata.zookeeper.ZooKeeperProvider Waiting for connected-state... INFO [main-SendThread(mike.local:2181)] 2012-09-25 14:02:24,576 org.apache.zookeeper.ClientCnxn Opening socket connection to server mike.local/10.0.2.106:2181. Will not attempt to authenticate using SASL (unknown error) INFO [main-SendThread(mike.local:2181)] 2012-09-25 14:02:24,584 org.apache.zookeeper.ClientCnxn Socket connection established to mike.local/10.0.2.106:2181, initiating session DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:24,586 org.apache.zookeeper.ClientCnxn Session establishment request sent on mike.local/10.0.2.106:2181 INFO [main-SendThread(mike.local:2181)] 2012-09-25 14:02:24,600 org.apache.zookeeper.ClientCnxn Session establishment complete on server mike.local/10.0.2.106:2181, sessionid = 0x139ff2e85b60005, negotiated timeout = 40000 DEBUG [main-EventThread] 2012-09-25 14:02:24,614 com.socrata.zookeeper.ZooKeeperProvider ConnectionStateChanged(Connected) DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:24,636 org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: clientPath:/ serverPath:/ finished:false header:: 0,12 replyHeader:: 0,0,0 request:: '/,F response:: v{} until SASL authentication completes. DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:37,923 org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: clientPath:/ serverPath:/ finished:false header:: 0,12 replyHeader:: 0,0,0 request:: '/,F response:: v{} until SASL authentication completes. DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:37,924 org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: clientPath:/ serverPath:/ finished:false header:: 0,12 replyHeader:: 0,0,0 request:: '/,F response:: v{} until SASL authentication completes. DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:37,924 org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: clientPath:null serverPath:null finished:false header:: -2,11 replyHeader:: null request:: null response:: nulluntil SASL authentication completes. DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:51,260 org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: clientPath:/ serverPath:/ finished:false header:: 0,12 replyHeader:: 0,0,0 request:: '/,F response:: v{} until SASL authentication completes. DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:51,260 org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: clientPath:null serverPath:null finished:false header:: -2,11 replyHeader:: null request:: null response:: nulluntil SASL authentication completes. DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:51,261 org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: clientPath:/ serverPath:/ finished:false header:: 0,12 replyHeader:: 0,0,0 request:: '/,F response:: v{} until SASL authentication completes. DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:51,261 org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: clientPath:null serverPath:null finished:false header:: -2,11 replyHeader:: null request:: null response:: nulluntil SASL authentication completes. DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:51,261 org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: clientPath:null serverPath:null finished:false header:: -2,11 replyHeader:: null request:: null response:: nulluntil SASL authentication completes. DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:51,265 org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: clientPath:/ serverPath:/ finished:false header:: 0,12 replyHeader:: 0,0,0 request:: '/,F response:: v{} until SASL authentication completes. DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:51,265 org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: clientPath:null serverPath:null finished:false header:: -2,11 replyHeader:: null request:: null response:: nulluntil SASL authentication completes. DEBUG [main-SendThread(mike.local:2181)] 2012-09-25 14:02:51,266 org.apache.zookeeper.ClientCnxnSocketNIO deferring non-priming packet: clientPath:null serverPath:null finished:false header:: -2,11 replyHeader:: null request:: null response:: nulluntil SASL authentication completes. INFO [main-SendThread(mike.local:2181)] 2012-09-25 14:02:51,266 org.apache.zookeeper.ClientCnxn Client session timed out, have not heard from server in 26668ms for sessionid 0x139ff2e85b60005, closing socket connection and attempting reconnect DEBUG [main-EventThread] 2012-09-25 14:02:51,377 com.socrata.zookeeper.ZooKeeperProvider ConnectionStateChanged(Disconnected) {code} {code:title=Oracle} INFO [main] 2012-09-25 14:03:16,315 com.socrata.Main Waiting for connection... DEBUG [main] 2012-09-25 14:03:16,319 com.socrata.zookeeper.ZooKeeperProvider Waiting for connected-state... INFO [main-SendThread(10.0.2.106:2181)] 2012-09-25 14:03:16,335 org.apache.zookeeper.ClientCnxn Opening socket connection to server 10.0.2.106/10.0.2.106:2181. Will not attempt to authenticate using SASL (Unable to locate a login configuration) INFO [main-SendThread(10.0.2.106:2181)] 2012-09-25 14:03:16,344 org.apache.zookeeper.ClientCnxn Socket connection established to 10.0.2.106/10.0.2.106:2181, initiating session DEBUG [main-SendThread(10.0.2.106:2181)] 2012-09-25 14:03:16,346 org.apache.zookeeper.ClientCnxn Session establishment request sent on 10.0.2.106/10.0.2.106:2181 DEBUG [main-SendThread(10.0.2.106:2181)] 2012-09-25 14:03:16,347 org.apache.zookeeper.client.ZooKeeperSaslClient Could not retrieve login configuration: java.lang.SecurityException: Unable to locate a login configuration DEBUG [main-SendThread(10.0.2.106:2181)] 2012-09-25 14:03:16,351 org.apache.zookeeper.client.ZooKeeperSaslClient Could not retrieve login configuration: java.lang.SecurityException: Unable to locate a login configuration INFO [main-SendThread(10.0.2.106:2181)] 2012-09-25 14:03:16,368 org.apache.zookeeper.ClientCnxn Session establishment complete on server 10.0.2.106/10.0.2.106:2181, sessionid = 0x139ff2e85b60006, negotiated timeout = 40000 DEBUG [main-SendThread(10.0.2.106:2181)] 2012-09-25 14:03:16,371 org.apache.zookeeper.client.ZooKeeperSaslClient Could not retrieve login configuration: java.lang.SecurityException: Unable to locate a login configuration DEBUG [main-SendThread(10.0.2.106:2181)] 2012-09-25 14:03:16,371 org.apache.zookeeper.client.ZooKeeperSaslClient Could not retrieve login configuration: java.lang.SecurityException: Unable to locate a login configuration DEBUG [main-EventThread] 2012-09-25 14:03:16,385 com.socrata.zookeeper.ZooKeeperProvider ConnectionStateChanged(Connected) DEBUG [main-SendThread(10.0.2.106:2181)] 2012-09-25 14:03:16,417 org.apache.zookeeper.client.ZooKeeperSaslClient Could not retrieve login configuration: java.lang.SecurityException: Unable to locate a login configuration DEBUG [main-SendThread(10.0.2.106:2181)] 2012-09-25 14:03:16,417 org.apache.zookeeper.client.ZooKeeperSaslClient Could not retrieve login configuration: java.lang.SecurityException: Unable to locate a login configuration DEBUG [main-SendThread(10.0.2.106:2181)] 2012-09-25 14:03:16,417 org.apache.zookeeper.client.ZooKeeperSaslClient Could not retrieve login configuration: java.lang.SecurityException: Unable to locate a login configuration DEBUG [main-SendThread(10.0.2.106:2181)] 2012-09-25 14:03:16,418 org.apache.zookeeper.client.ZooKeeperSaslClient Could not retrieve login configuration: java.lang.SecurityException: Unable to locate a login configuration DEBUG [main-SendThread(10.0.2.106:2181)] 2012-09-25 14:03:16,418 org.apache.zookeeper.client.ZooKeeperSaslClient Could not retrieve login configuration: java.lang.SecurityException: Unable to locate a login configuration DEBUG [main-SendThread(10.0.2.106:2181)] 2012-09-25 14:03:16,431 org.apache.zookeeper.client.ZooKeeperSaslClient Could not retrieve login configuration: java.lang.SecurityException: Unable to locate a login configuration DEBUG [main-SendThread(10.0.2.106:2181)] 2012-09-25 14:03:16,438 org.apache.zookeeper.client.ZooKeeperSaslClient Could not retrieve login configuration: java.lang.SecurityException: Unable to locate a login configuration DEBUG [main-SendThread(10.0.2.106:2181)] 2012-09-25 14:03:16,443 org.apache.zookeeper.ClientCnxn Reading reply sessionid:0x139ff2e85b60006, packet:: clientPath:/ serverPath:/ finished:false header:: 1,12 replyHeader:: 1,8292982,0 request:: '/,F response:: v{'ro,'row-index,'zkbtest,'consumers,'reindex,'hotstandby,'bigdir,'vs,'orestes,'eurybates,'shardedcly,'row-locks,'id-counter,'zookeeper,'cly,'locks,'rwlocks,'tickets,'brokers},s{0,0,0,0,0,61,0,0,0,19,8292893} DEBUG [main-SendThread(10.0.2.106:2181)] 2012-09-25 14:03:16,444 org.apache.zookeeper.client.ZooKeeperSaslClient Could not retrieve login configuration: java.lang.SecurityException: Unable to locate a login configuration OK(Set(cly, row-locks, hotstandby, locks, tickets, bigdir, zkbtest, row-index, reindex, id-counter, eurybates, vs, rwlocks, shardedcly, brokers, consumers, zookeeper, orestes, ro),0,0,0,0,0,61,0,0,0,19,8292893) {code} |
242160 | No Perforce job exists for this issue. | 3 | 12707 | 7 years, 25 weeks, 6 days ago | 0|i02j67: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1549 | Data inconsistency when follower is receiving a DIFF with a dirty snapshot |
Bug | Open | Major | Unresolved | Flavio Paiva Junqueira | Jacky007 | Jacky007 | 10/Sep/12 03:58 | 05/Feb/20 07:15 | 3.4.3 | 3.7.0, 3.5.8 | quorum | 2 | 21 | ZOOKEEPER-1558, ZOOKEEPER-1559, ZOOKEEPER-2020 | ZOOKEEPER-107 | the trunc code (from ZOOKEEPER-1154?) cannot work correct if the snapshot is not correct. here is scenario(similar to 1154): Initial Condition 1. Lets say there are three nodes in the ensemble A,B,C with A being the leader 2. The current epoch is 7. 3. For simplicity of the example, lets say zxid is a two digit number, with epoch being the first digit. 4. The zxid is 73 5. All the nodes have seen the change 73 and have persistently logged it. Step 1 Request with zxid 74 is issued. The leader A writes it to the log but there is a crash of the entire ensemble and B,C never write the change 74 to their log. Step 2 A,B restart, A is elected as the new leader, and A will load data and take a clean snapshot(change 74 is in it), then send diff to B, but B died before sync with A. A died later. Step 3 B,C restart, A is still down B,C form the quorum B is the new leader. Lets say B minCommitLog is 71 and maxCommitLog is 73 epoch is now 8, zxid is 80 Request with zxid 81 is successful. On B, minCommitLog is now 71, maxCommitLog is 81 Step 4 A starts up. It applies the change in request with zxid 74 to its in-memory data tree A contacts B to registerAsFollower and provides 74 as its ZxId Since 71<=74<=81, B decides to send A the diff. Problem: The problem with the above sequence is that after truncate the log, A will load the snapshot again which is not correct. In 3.3 branch, FileTxnSnapLog.restore does not call listener(ZOOKEEPER-874), the leader will send a snapshot to follower, it will not be a problem. |
242161 | No Perforce job exists for this issue. | 3 | 12708 | 2 years, 32 weeks, 2 days ago | 0|i02j6f: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1548 | Cluster fails election loop in new and interesting way |
Bug | Closed | Major | Duplicate | Unassigned | Alan Horn | Alan Horn | 07/Sep/12 17:51 | 13/Mar/14 14:16 | 29/Aug/13 10:22 | 3.4.3 | 3.4.6 | leaderElection | 0 | 6 | ZOOKEEPER-1115 | Hi, We have a five node cluster, recently upgraded from 3.3.5 to 3.4.3. Was running fine for a few weeks after the upgrade, then the following sequence of events occurred : 1. All servers stopped responding to 'ruok' at the same time 2. Our local supervisor process restarted all of them at the same time (yes, this is bad, we didn't expect it to fail this way :) 3. The cluster would not serve requests after this. Appeared to be unable to complete an election. We tried various things at this point, none of which worked : * Moved around the restart order of the nodes (e.g. 4 thru 0, instead of 0 thru 4) * Reduced number of running nodes from 5 -> 3 to simplify the quorum, by only starting up 0, 1 & 2, in one test, and 0, 2 & 4 in the other * Removed the *Epoch files from version-2/ snapshot directory * Put the same version2/snapshot.xxxxx file on each server in the cluster * Added the (same on all nodes) last txlog onto each cluster * Kept only the last snapshot plus txlog unique on each server * Moved leaderServes=no to leaderServes=yes * Removed all files and started up with empty data as a control. This worked, but of course isn't terribly useful :) Finally, I brought the data up on a single node running in standalone and this worked (yay!) So at this point we brought the single node back into service and have kept the other four available to debug why the election is failing. We downgraded the four nodes to 3.3.5, and then they completed the election and started serving as expected. We did a rolling upgrade to 3.4.3, and everything was fine until we restarted the leader, whereupon we encountered the same re-election loop as before. We're a bit out of ideas at this point, so I was hoping someone from this list might have some useful input. Output from two followers and a leader during this condition are attached. Cheers, Al |
242162 | No Perforce job exists for this issue. | 3 | 12709 | 6 years, 2 weeks ago | 0|i02j6n: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1547 | Test robustness of client using SASL in the presence of dropped requests |
Improvement | Open | Major | Unresolved | Unassigned | Eugene Joseph Koontz | Eugene Joseph Koontz | 04/Sep/12 16:42 | 06/Nov/12 00:13 | 0 | 2 | ZOOKEEPER-1437 | ZK clients send SASL packets to ZK servers as request packets. However, what if the server does not responds to the client's SASL packets with responses? In this scenario, the server does not actually close the connection to the client, it simply fails to respond to SASL requests. Make sure the client can cope with this behavior. Background: In ZOOKEEPER-1437, Ben writes: "[I]t would be great to add a test that simply drops responses to clients without closing connections." https://issues.apache.org/jira/browse/ZOOKEEPER-1437?focusedCommentId=13447477&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13447477 Also in ZOOKEEPER-1437 Rakesh writes: "I could see DisconnectableZooKeeper.disconnect() has network delays/partition simulation logic." https://issues.apache.org/jira/browse/ZOOKEEPER-1437?focusedCommentId=13445704&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13445704 |
242163 | No Perforce job exists for this issue. | 0 | 12710 | 7 years, 29 weeks, 2 days ago | 0|i02j6v: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1546 | "Unable to load database on disk" when restarting after node freeze |
Bug | Open | Major | Unresolved | Unassigned | Erik Forsberg | Erik Forsberg | 04/Sep/12 05:36 | 04/Jun/15 14:55 | 3.3.5 | server | 1 | 4 | One of my zookeeper servers in a quorum of 3 froze (probably due to underlying hardware problems). When restarting, zookeeper fails to start with the following in zookeeper.log: {noformat} 2012-09-04 09:02:35,300 - INFO [main:QuorumPeerConfig@90] - Reading configuration from: /etc/zookeeper/zoo.cfg 2012-09-04 09:02:35,316 - INFO [main:QuorumPeerConfig@310] - Defaulting to majority quorums 2012-09-04 09:02:35,333 - INFO [main:QuorumPeerMain@119] - Starting quorum peer 2012-09-04 09:02:35,358 - INFO [main:NIOServerCnxn$Factory@143] - binding to port 0.0.0.0/0.0.0.0:2181 2012-09-04 09:02:35,379 - INFO [main:QuorumPeer@819] - tickTime set to 2000 2012-09-04 09:02:35,380 - INFO [main:QuorumPeer@830] - minSessionTimeout set to -1 2012-09-04 09:02:35,380 - INFO [main:QuorumPeer@841] - maxSessionTimeout set to -1 2012-09-04 09:02:35,386 - INFO [main:QuorumPeer@856] - initLimit set to 10 2012-09-04 09:02:35,523 - INFO [main:FileSnap@82] - Reading snapshot /var/zookeeper/version-2/snapshot.500017240 2012-09-04 09:02:38,944 - ERROR [main:FileTxnSnapLog@226] - Failed to increment parent cversion for: /osp/production/scheduler/waitfordeps_tasks/per_period-3092724ef4d611e18411525400fff018-bulkload_histograms org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /osp/production/scheduler/waitfordeps_tasks/per_period-3092724ef4d611e18411525400fff018-bulkload_histograms at org.apache.zookeeper.server.DataTree.incrementCversion(DataTree.java:1218) at org.apache.zookeeper.server.persistence.FileTxnSnapLog.processTransaction(FileTxnSnapLog.java:224) at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:152) at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:222) at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:398) at org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:143) at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:103) at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:76) 2012-09-04 09:02:38,945 - FATAL [main:QuorumPeer@400] - Unable to load database on disk java.io.IOException: Failed to process transaction type: 2 error: KeeperErrorCode = NoNode for /osp/production/scheduler/waitfordeps_tasks/per_period-3092724ef4d611e18411525400fff018-bulkload_histograms at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:154) at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:222) at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:398) at org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:143) at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:103) at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:76) 2012-09-04 09:02:38,946 - FATAL [main:QuorumPeerMain@87] - Unexpected exception, exiting abnormally java.lang.RuntimeException: Unable to run quorum server at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:401) at org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:143) at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:103) at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:76) Caused by: java.io.IOException: Failed to process transaction type: 2 error: KeeperErrorCode = NoNode for /osp/production/scheduler/waitfordeps_tasks/per_period-3092724ef4d611e18411525400fff018-bulkload_histograms at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:154) at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:222) at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:398) ... 3 more {noformat} Removing data from /var/zookeeper/version-2 then restart seems to "fix" the problem (it gets a snapshot from one of the other nodes in the quorum). This is Zookeeper 3.3.5+19.5-1~squeeze-cdh3, i.e. from Cloudera's distribution. |
242164 | No Perforce job exists for this issue. | 0 | 12711 | 4 years, 42 weeks ago | 0|i02j73: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1545 | very odd issue about zookeeper when deploy two web application in one tomcat |
Bug | Open | Major | Unresolved | Unassigned | L.J.W | L.J.W | 04/Sep/12 02:20 | 23/Feb/19 03:06 | 3.4.3 | java client | 0 | 2 | OS:windows 7 32 zookeeper 3.4.3 tomcat 7.0.29 |
if I deploy two application(both use zookeeper) to same tomcat,zookeeper in one app will inexplicable disconnect when tomcat startup. following is my code,it is very simple: public class ZKTester implements InitializingBean, Watcher { private ZooKeeper hZooKeeper; public void afterPropertiesSet() throws Exception { hZooKeeper = new ZooKeeper("localhost:2181", 300000, this); } public void process(WatchedEvent event) { System.out.println("**************" + event); } and the spring config file: <bean id="zooTester" class="com.abc.framework.cluster.ZKTester"/> And following is tomcat's startup log: ... **************WatchedEvent state:Disconnected type:None path:null **************WatchedEvent state:Expired type:None path:null ... |
242165 | No Perforce job exists for this issue. | 0 | 12712 | 1 year, 3 weeks, 5 days ago | 0|i02j7b: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1544 | System.exit() calls on interrupted SyncThread |
Bug | Resolved | Trivial | Duplicate | Unassigned | Dawid Weiss | Dawid Weiss | 03/Sep/12 10:01 | 03/Sep/12 12:03 | 03/Sep/12 12:03 | 3.3.6 | 0 | 2 | We have a test framework at Lucene/Solr which attempts to interrupt threads that leak out of a single class (suite) scope. The problem we're facing is that ZooKeeper's SyncThread is doing this: {code} >> LOG.fatal("Severe unrecoverable error, exiting", t); >> System.exit(11); {code} Is this terminating the JVM really needed here? Could it be made optional with a system property or even removed entirely? Currently it aborts the entire JUnit runner and prevents successive tests from continuing. |
242166 | No Perforce job exists for this issue. | 0 | 12713 | 7 years, 29 weeks, 3 days ago | 0|i02j7j: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1543 | Bad sessionId/password combo should return auth failure |
Improvement | Open | Major | Unresolved | Unassigned | Ben Bangert | Ben Bangert | 31/Aug/12 14:40 | 10/Sep/12 19:34 | 3.4.3, 3.3.6, 3.5.0 | server | 1 | 4 | All | When connecting to a server with a valid session id, but invalid password, Zookeeper disconnects with a SESSION_EXPIRED error. This is blatantly false, its actually the wrong password. Returning a SESSION_EXPIRED in this case is also not documented anywhere. This makes debugging this issue an absolute nightmare, since the server has already lead you down the wrong track (trying to figure out why the session is expired, but it isn't). There's already an AUTH_FAILURE error, why not return that? |
242167 | No Perforce job exists for this issue. | 0 | 12714 | 7 years, 28 weeks, 3 days ago | 0|i02j7r: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1542 | zkServer.sh start fails but exit status 0 |
Bug | Open | Major | Unresolved | Unassigned | Ryu Umayahara | Ryu Umayahara | 24/Aug/12 14:53 | 24/Aug/12 15:00 | 3.3.6 | scripts | 0 | 1 | Windwos7 + Cygwin | zkServer.sh 99 nohup $JAVA "-Dzookeeper.log.dir=${ZOO_LOG_DIR}" "-Dzookeeper.root.logger=${ZOO_LOG4J_PROP}" \ 100 -cp "$CLASSPATH" $JVMFLAGS $ZOOMAIN "$ZOOCFG" > "$_ZOO_DAEMON_OUT" 2>&1 < /dev/null & Cannot capture exit status of a background process. 101 if [ $? -eq 0 ] 102 then 103 if /bin/echo -n $! > "$ZOOPIDFILE" 104 then 105 sleep 1 106 echo STARTED 107 else 108 echo FAILED TO WRITE PID 109 exit 1 110 fi 111 else 112 echo SERVER DID NOT START 113 exit 1 114 fi |
242168 | No Perforce job exists for this issue. | 1 | 12715 | 7 years, 30 weeks, 6 days ago | 0|i02j7z: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1541 | Zookeeper distributions are not available. |
Bug | Resolved | Critical | Cannot Reproduce | Unassigned | Yuta Okamoto | Yuta Okamoto | 24/Aug/12 01:09 | 09/Oct/13 02:19 | 09/Oct/13 02:19 | 0 | 2 | I can't download zookeeper distribution because of "404 Not Found". http://www.apache.org/dist/zookeeper/ |
242169 | No Perforce job exists for this issue. | 0 | 12716 | 6 years, 24 weeks, 1 day ago | 0|i02j87: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1540 | ZOOKEEPER-1411 breaks backwards compatibility |
Bug | Resolved | Major | Fixed | Andrew Ferguson | Andrew Ferguson | Andrew Ferguson | 23/Aug/12 13:51 | 02/Mar/16 20:34 | 25/Sep/12 01:33 | 3.5.0 | 3.5.0 | 0 | 5 | ZOOKEEPER-1411 | There is a one-line bug in ZOOKEEPER-1411 which breaks backwards compatibility for sites which are using separate configuration files for each server. The bug is with the handling of the clientPort option. One line fix to follow shortly. thanks! Andrew |
242170 | No Perforce job exists for this issue. | 2 | 12717 | 7 years, 26 weeks, 2 days ago | 0|i02j8f: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1539 | Tests in QuorumUtil.startAll() and JMXenv |
Bug | Open | Minor | Unresolved | Unassigned | Alexander Shraer | Alexander Shraer | 22/Aug/12 00:59 | 22/Sep/12 21:49 | tests | 0 | 1 | Consider the following test: @Test public void newTest() throws Exception { QuorumUtil qu = new QuorumUtil(3); qu.startAll(); } Although it doesn't seem like we're checking anything at all here, this test actually fails. There is a JMXEnv.ensureAll test invoked from startAll(). It passes for QuorumUtil(1) or QuorumUtil(2) servers but fails for any larger number. Besides the fact that there's a bug in the tests, I think we should call the function differently if we want to invoke tests in it, or alternatively remove these tests or make them optional using some parameter. |
242171 | No Perforce job exists for this issue. | 0 | 12718 | 7 years, 31 weeks, 1 day ago | 0|i02j8n: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1538 | Improve space handling in zkServer.sh and zkEnv.sh |
Bug | Resolved | Trivial | Fixed | Andrew Ferguson | Andrew Ferguson | Andrew Ferguson | 21/Aug/12 20:00 | 25/Jun/13 14:07 | 07/Sep/12 02:23 | 3.4.3 | 3.5.0 | 0 | 6 | Running `bin/zkServer.sh start` from a freshly-built copy of trunk fails if the source code is checked-out to a directory with spaces in the name. I'll include a small fix to fix this problem. thanks! |
242172 | No Perforce job exists for this issue. | 1 | 12719 | 6 years, 39 weeks, 2 days ago | 0|i02j8v: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1537 | registration page not accepting capital letters |
Bug | Resolved | Minor | Incomplete | Unassigned | mohammad taher | mohammad taher | 17/Aug/12 10:20 | 30/Aug/12 01:39 | 30/Aug/12 01:39 | 3.3.5 | c client | 0 | 3 | 1510560 | 1510560 | 0% | WINDOWS XP, MOZILLA FIREFOX, 500 GB HARD DISK, 2 GB RAM |
1.Type zookeeper URL in the address bar to go to home page of it. 2.For new users, click on "new user" and it will open a registration form. 3.Give your full name in capital letters as mentioned. 4.Even though I give capital letters it is not accepting and is giving an error message as "PLEASE TYPE CAPITAL LETTERS" |
0% | 0% | 1510560 | 1510560 | performance | 242173 | No Perforce job exists for this issue. | 0 | 12720 | 7 years, 30 weeks ago | 0|i02j93: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1536 | c client : memory leak in winport.c |
Bug | Resolved | Major | Fixed | brooklin | brooklin | brooklin | 15/Aug/12 23:13 | 31/Aug/12 07:02 | 30/Aug/12 16:38 | 3.4.3 | 3.4.4, 3.5.0 | c client | 0 | 5 | windows7 | At line 99 in winport.c, use windows API "InitializeCriticalSection" but never call "DeleteCriticalSection" | 242174 | No Perforce job exists for this issue. | 1 | 12721 | 7 years, 29 weeks, 6 days ago | 0|i02j9b: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1535 | ZK Shell/Cli re-executes last command on exit |
Bug | Closed | Major | Fixed | Edward Ribeiro | Stu Hood | Stu Hood | 14/Aug/12 19:56 | 20/May/17 19:07 | 30/Dec/12 22:21 | 3.4.6, 3.5.0 | scripts | 0 | 6 | ZOOKEEPER-1897, ZOOKEEPER-2787, HBASE-10903 | zookeeper-3.4.3 release | In the ZK 3.4.3 release's version of zkCli.sh, the last command that was executed is *re*-executed when you {{ctrl+d}} out of the shell. In the snippet below, {{ls}} is executed, and then {{ctrl+d}} is triggered (inserted below to illustrate), the output from {{ls}} appears again, due to the command being re-run. {noformat} [zk: zookeeper.example.com:2181(CONNECTED) 0] ls /blah [foo] [zk: zookeeper.example.com:2181(CONNECTED) 1] <ctrl+d> [foo] $ {noformat} |
cli, shell, zkcli, zkcli.sh | 242175 | No Perforce job exists for this issue. | 2 | 12722 | 6 years, 2 weeks ago | 0|i02j9j: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1534 | Zookeeper server do not send Sal authentication failure notification to the client |
Bug | Open | Major | Unresolved | Unassigned | Tally Tsabary | Tally Tsabary | 13/Aug/12 04:55 | 14/Feb/18 15:46 | 3.4.3 | server | 0 | 5 | Windows 7. Zookeeper 3.4.3 Curator 1.1.15 Java 1.6 | Server side: zookeeper 3.4.3 with patch ZOOKEEPER-1437.patch 22/Jun/12 00:24 Client side: java, Curator 1.1.15, zookeeper 3.4.3 with patch ZOOKEEPER-1437.patch 22/Jun/12 00:24 Environment configured to use Sasl authentication. While the authenticatiion is successful, everything works fine. In case of authentication failue, it seems that the zk server catch the SaslException and close the socket without sending any additional notification to the client, so despite the client has an implementation to handle Sasl authentication failure, it is never used… Details: ========= zk server log: {noformat} 2012-08-10 11:00:46,730 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@213] - Accepted socket connection from /127.0.0.1:50208 2012-08-10 11:00:46,731 [myid:] - DEBUG [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@780] - Session establishment request from client /127.0.0.1:50208 client's lastZxid is 0x0 2012-08-10 11:00:46,731 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@838] - Client attempting to establish new session at /127.0.0.1:50208 2012-08-10 11:00:46,733 [myid:] - DEBUG [SyncThread:0:FinalRequestProcessor@88] - Processing request:: sessionid:0x1390fd2ee630004 type:createSession cxid:0x0 zxid:0x26b txntype:-10 reqpath:n/a 2012-08-10 11:00:46,733 [myid:] - DEBUG [SyncThread:0:FinalRequestProcessor@160] - sessionid:0x1390fd2ee630004 type:createSession cxid:0x0 zxid:0x26b txntype:-10 reqpath:n/a 2012-08-10 11:00:46,734 [myid:] - INFO [SyncThread:0:ZooKeeperServer@604] - Established session 0x1390fd2ee630004 with negotiated timeout 40000 for client /127.0.0.1:50208 2012-08-10 11:00:46,736 [myid:] - DEBUG [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@919] - Responding to client SASL token. 2012-08-10 11:00:46,736 [myid:] - DEBUG [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@923] - Size of client SASL token: 0 2012-08-10 11:00:46,736 [myid:] - DEBUG [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@954] - Size of server SASL response: 101 2012-08-10 11:00:46,740 [myid:] - DEBUG [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@919] - Responding to client SASL token. 2012-08-10 11:00:46,741 [myid:] - DEBUG [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@923] - Size of client SASL token: 272 2012-08-10 11:00:46,741 [myid:] - DEBUG [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:SaslServerCallbackHandler@106] - client supplied realm: zk-sasl-md5 2012-08-10 11:00:46,741 [myid:] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@939] - Client failed to SASL authenticate: javax.security.sasl.SaslException: DIGEST-MD5: digest response format violation. Mismatched response. 2012-08-10 11:00:46,742 [myid:] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@945] - Closing client connection due to SASL authentication failure. 2012-08-10 11:00:46,742 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1000] - Closed socket connection for client /127.0.0.1:50208 which had sessionid 0x1390fd2ee630004 2012-08-10 11:00:46,743 [myid:] - ERROR [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@180] - Unexpected Exception: java.nio.channels.CancelledKeyException at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:55) at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:59) at org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:153) at org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1075) at org.apache.zookeeper.server.ZooKeeperServer.processPacket(ZooKeeperServer.java:906) at org.apache.zookeeper.server.NIOServerCnxn.readRequest(NIOServerCnxn.java:365) at org.apache.zookeeper.server.NIOServerCnxn.readPayload(NIOServerCnxn.java:202) at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:236) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:224) at java.lang.Thread.run(Thread.java:662) {noformat} At the corresponding source: org.apache.zookeeper.server.ZooKeeperServer {noformat} private Record processSasl(ByteBuffer incomingBuffer, ServerCnxn cnxn) throws IOException { LOG.debug("Responding to client SASL token."); GetSASLRequest clientTokenRecord = new GetSASLRequest(); ByteBufferInputStream.byteBuffer2Record(incomingBuffer,clientTokenRecord); byte[] clientToken = clientTokenRecord.getToken(); LOG.debug("Size of client SASL token: " + clientToken.length); byte[] responseToken = null; try { ZooKeeperSaslServer saslServer = cnxn.zooKeeperSaslServer; try { // note that clientToken might be empty (clientToken.length == 0): // if using the DIGEST-MD5 mechanism, clientToken will be empty at the beginning of the // SASL negotiation process. responseToken = saslServer.evaluateResponse(clientToken); if (saslServer.isComplete() == true) { String authorizationID = saslServer.getAuthorizationID(); LOG.info("adding SASL authorization for authorizationID: " + authorizationID); cnxn.addAuthInfo(new Id("sasl",authorizationID)); } } catch (SaslException e) { LOG.warn("Client failed to SASL authenticate: " + e); if ((System.getProperty("zookeeper.allowSaslFailedClients") != null) && (System.getProperty("zookeeper.allowSaslFailedClients").equals("true"))) { LOG.warn("Maintaining client connection despite SASL authentication failure."); } else { LOG.warn("Closing client connection due to SASL authentication failure."); cnxn.close(); Tally: at this stage the socket is closed without sending any notification to the client } } } catch (NullPointerException e) { LOG.error("cnxn.saslServer is null: cnxn object did not initialize its saslServer properly."); } if (responseToken != null) { LOG.debug("Size of server SASL response: " + responseToken.length); } // wrap SASL response token to client inside a Response object. return new SetSASLResponse(responseToken); } {noformat} The client log shows that the client identified the socket closer and just retry to connect as if the zk server just went down.. {noformat} [10-Aug-2012 11:00:44.558 IST] INFO <org.apache.zookeeper.ClientCnxn$SendThread> Opening socket connection to server 127.0.0.1/127.0.0.1:2181 [10-Aug-2012 11:00:44.559 IST] INFO <org.apache.zookeeper.client.ZooKeeperSaslClient> Found Login Context section 'Client': will use it to attempt to SASL-authenticate. [10-Aug-2012 11:00:44.560 IST] INFO <org.apache.zookeeper.client.ZooKeeperSaslClient> Client will use DIGEST-MD5 as SASL mechanism. [10-Aug-2012 11:00:44.561 IST] INFO <org.apache.zookeeper.ClientCnxn$SendThread> Socket connection established to 127.0.0.1/127.0.0.1:2181, initiating session [10-Aug-2012 11:00:44.563 IST] DEBUG <org.apache.zookeeper.ClientCnxn$SendThread> Session establishment request sent on 127.0.0.1/127.0.0.1:2181 [10-Aug-2012 11:00:44.564 IST] DEBUG <org.apache.zookeeper.ClientCnxnSocketNIO> deferring non-priming packet: clientPath:null serverPath:null finished:false header:: 0,3 replyHeader:: 0,0,0 request:: '/dev,F response:: until SASL authentication completes. [10-Aug-2012 11:00:44.566 IST] DEBUG <org.apache.zookeeper.ClientCnxnSocketNIO> deferring non-priming packet: clientPath:/ serverPath:/ finished:false header:: 0,9 replyHeader:: 0,0,0 request:: '/ response:: until SASL authentication completes. [10-Aug-2012 11:00:44.568 IST] INFO <org.apache.zookeeper.ClientCnxn$SendThread> Session establishment complete on server 127.0.0.1/127.0.0.1:2181, sessionid = 0x1390fd2ee630003, negotiated timeout = 40000 [10-Aug-2012 11:00:44.569 IST] INFO <com.netflix.curator.framework.state.ConnectionStateManager> State change: RECONNECTED [10-Aug-2012 11:00:44.569 IST] DEBUG <org.apache.zookeeper.ClientCnxnSocketNIO> deferring non-priming packet: clientPath:null serverPath:null finished:false header:: 0,3 replyHeader:: 0,0,0 request:: '/dev,F response:: until SASL authentication completes. [10-Aug-2012 11:00:44.572 IST] DEBUG <org.apache.zookeeper.ClientCnxnSocketNIO> deferring non-priming packet: clientPath:/ serverPath:/ finished:false header:: 0,9 replyHeader:: 0,0,0 request:: '/ response:: until SASL authentication completes. [10-Aug-2012 11:00:44.574 IST] DEBUG <org.apache.zookeeper.ClientCnxnSocketNIO> deferring non-priming packet: clientPath:null serverPath:null finished:false header:: 0,3 replyHeader:: 0,0,0 request:: '/dev,F response:: until SASL authentication completes. [10-Aug-2012 11:00:44.576 IST] DEBUG <org.apache.zookeeper.ClientCnxnSocketNIO> deferring non-priming packet: clientPath:/ serverPath:/ finished:false header:: 0,9 replyHeader:: 0,0,0 request:: '/ response:: until SASL authentication completes. [10-Aug-2012 11:00:44.578 IST] DEBUG <org.apache.zookeeper.client.ZooKeeperSaslClient> ClientCnxn:sendSaslPacket:length=0 [10-Aug-2012 11:00:44.579 IST] DEBUG <org.apache.zookeeper.ClientCnxnSocketNIO> deferring non-priming packet: clientPath:null serverPath:null finished:false header:: 0,3 replyHeader:: 0,0,0 request:: '/dev,F response:: until SASL authentication completes. [10-Aug-2012 11:00:44.581 IST] DEBUG <org.apache.zookeeper.ClientCnxnSocketNIO> deferring non-priming packet: clientPath:/ serverPath:/ finished:false header:: 0,9 replyHeader:: 0,0,0 request:: '/ response:: until SASL authentication completes. [10-Aug-2012 11:00:44.583 IST] DEBUG <org.apache.zookeeper.ClientCnxnSocketNIO> deferring non-priming packet: clientPath:/ serverPath:/ finished:false header:: 0,9 replyHeader:: 0,0,0 request:: '/ response:: until SASL authentication completes. [10-Aug-2012 11:00:44.585 IST] DEBUG <org.apache.zookeeper.ClientCnxnSocketNIO> deferring non-priming packet: clientPath:null serverPath:null finished:false header:: 0,3 replyHeader:: 0,0,0 request:: '/dev,F response:: until SASL authentication completes. [10-Aug-2012 11:00:44.587 IST] DEBUG <org.apache.zookeeper.ClientCnxnSocketNIO> deferring non-priming packet: clientPath:/ serverPath:/ finished:false header:: 0,9 replyHeader:: 0,0,0 request:: '/ response:: until SASL authentication completes. [10-Aug-2012 11:00:44.589 IST] DEBUG <org.apache.zookeeper.ClientCnxnSocketNIO> deferring non-priming packet: clientPath:/ serverPath:/ finished:false header:: 0,9 replyHeader:: 0,0,0 request:: '/ response:: until SASL authentication completes. [10-Aug-2012 11:00:44.591 IST] DEBUG <org.apache.zookeeper.client.ZooKeeperSaslClient$2> saslClient.evaluateChallenge(len=101) [10-Aug-2012 11:00:44.592 IST] DEBUG <org.apache.zookeeper.client.ZooKeeperSaslClient> ClientCnxn:sendSaslPacket:length=272 [10-Aug-2012 11:00:44.593 IST] DEBUG <org.apache.zookeeper.ClientCnxnSocketNIO> deferring non-priming packet: clientPath:null serverPath:null finished:false header:: 0,3 replyHeader:: 0,0,0 request:: '/dev,F response:: until SASL authentication completes. [10-Aug-2012 11:00:44.596 IST] DEBUG <org.apache.zookeeper.ClientCnxnSocketNIO> deferring non-priming packet: clientPath:/ serverPath:/ finished:false header:: 0,9 replyHeader:: 0,0,0 request:: '/ response:: until SASL authentication completes. [10-Aug-2012 11:00:44.598 IST] DEBUG <org.apache.zookeeper.ClientCnxnSocketNIO> deferring non-priming packet: clientPath:/ serverPath:/ finished:false header:: 0,9 replyHeader:: 0,0,0 request:: '/ response:: until SASL authentication completes. [10-Aug-2012 11:00:44.600 IST] INFO <org.apache.zookeeper.ClientCnxn$SendThread> Unable to read additional data from server sessionid 0x1390fd2ee630003, likely server has closed socket, closing socket connection and attempting reconnect [10-Aug-2012 11:00:44.701 IST] ERROR <com.netflix.curator.framework.imps.CuratorFrameworkImpl> Background operation retry gave up org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) at com.netflix.curator.framework.imps.CuratorFrameworkImpl.processBackgroundOperation(CuratorFrameworkImpl.java:438) at com.netflix.curator.framework.imps.BackgroundSyncImpl$1.processResult(BackgroundSyncImpl.java:49) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:606) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) [10-Aug-2012 11:00:44.706 IST] INFO <com.netflix.curator.framework.state.ConnectionStateManager> State change: LOST [10-Aug-2012 11:00:44.708 IST] WARN <com.netflix.curator.framework.state.ConnectionStateManager> ConnectionStateManager queue full - dropping events to make room [10-Aug-2012 11:00:44.710 IST] INFO <com.netflix.curator.framework.state.ConnectionStateManager> State change: SUSPENDED {noformat} |
242176 | No Perforce job exists for this issue. | 0 | 12723 | 2 years, 5 weeks, 1 day ago | 0|i02j9r: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1533 | Correct the documentation of the args for the JavaExample doc. |
Bug | Resolved | Minor | Fixed | Warren Turkal | Warren Turkal | Warren Turkal | 13/Aug/12 02:50 | 02/Mar/16 20:35 | 14/Aug/12 19:11 | 3.3.0, 3.3.1, 3.3.2, 3.3.3, 3.3.4, 3.4.0, 3.4.1, 3.4.2, 3.4.3, 3.3.5, 3.3.6, 3.4.4, 3.5.0 | 3.5.0 | documentation | 0 | 4 | Small doc fix in the JavaExample doc. | 242177 | No Perforce job exists for this issue. | 1 | 12724 | 7 years, 32 weeks, 1 day ago | 0|i02j9z: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1532 | Correct the documentation of the args for the JavaExample doc. |
Improvement | Resolved | Major | Invalid | Unassigned | Warren Turkal | Warren Turkal | 09/Aug/12 19:13 | 09/May/14 17:44 | 09/May/14 17:44 | 3.5.0 | 0 | 1 | Small doc fix. | 242178 | No Perforce job exists for this issue. | 0 | 12725 | 5 years, 47 weeks, 6 days ago | 0|i02ja7: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1531 | Correct the documentation of the args for the JavaExample doc. |
Bug | Resolved | Major | Duplicate | Unassigned | Warren Turkal | Warren Turkal | 09/Aug/12 19:13 | 03/Sep/13 02:53 | 03/Sep/13 02:53 | 3.5.0 | 0 | 1 | Small doc fix. | 242179 | No Perforce job exists for this issue. | 0 | 12726 | 6 years, 29 weeks, 3 days ago | 0|i02jaf: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1530 | Correct the documentation of the args for the JavaExample doc. |
Bug | Resolved | Major | Duplicate | Unassigned | Warren Turkal | Warren Turkal | 09/Aug/12 19:13 | 03/Sep/13 02:54 | 03/Sep/13 02:54 | 0 | 0 | Small doc fix. | 242180 | No Perforce job exists for this issue. | 0 | 12727 | 7 years, 33 weeks ago | 0|i02jan: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1529 | Correct the documentation of the args for the JavaExample doc. |
Bug | Resolved | Minor | Duplicate | Unassigned | Warren Turkal | Warren Turkal | 09/Aug/12 16:51 | 03/Sep/13 02:54 | 03/Sep/13 02:54 | 0 | 0 | Correct the documentation of the args for the JavaExample doc. | 242181 | No Perforce job exists for this issue. | 0 | 12728 | 7 years, 33 weeks ago | 0|i02jav: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1528 | Correct the documentation of the args for the JavaExample doc. |
Bug | Resolved | Minor | Duplicate | Unassigned | Warren Turkal | Warren Turkal | 09/Aug/12 16:50 | 03/Sep/13 02:55 | 03/Sep/13 02:55 | 0 | 0 | I added another listitem documenting the filename arg of the JavaExample code. | 242182 | No Perforce job exists for this issue. | 0 | 12729 | 7 years, 33 weeks ago | 0|i02jb3: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1527 | Correct the documentation of the args for the JavaExample doc |
Bug | Resolved | Trivial | Duplicate | Unassigned | Warren Turkal | Warren Turkal | 09/Aug/12 16:50 | 11/Oct/13 12:39 | 11/Oct/13 12:39 | 0 | 0 | I added another listitem documenting the filename arg of the JavaExample code. | 242183 | No Perforce job exists for this issue. | 0 | 12730 | 6 years, 23 weeks, 6 days ago | 0|i02jbb: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1526 | Correct the documentation of the args for the JavaExample doc |
Bug | Open | Trivial | Unresolved | Unassigned | Warren Turkal | Warren Turkal | 09/Aug/12 16:50 | 09/Aug/12 16:50 | 0 | 0 | I added another listitem documenting the filename arg of the JavaExample code. | 242184 | No Perforce job exists for this issue. | 0 | 12731 | 7 years, 33 weeks ago | 0|i02jbj: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1525 | Plumb ZooKeeperServer object into auth plugins |
Improvement | Resolved | Major | Fixed | Jordan Zimmerman | Warren Turkal | Warren Turkal | 02/Aug/12 19:35 | 21/Nov/16 17:43 | 17/Nov/16 11:20 | 3.5.0 | 3.6.0 | 7 | 11 | ZOOKEEPER-2143 | I want to plumb the ZooKeeperServer object into the auth plugins so that I can store authentication data in zookeeper itself. With access to the ZooKeeperServer object, I also have access to the ZKDatabase and can look up entries in the local copy of the zookeeper data. In order to implement this, I make sure that a ZooKeeperServer instance is passed in to the ProviderRegistry.initialize() method. Then initialize() will try to find a constructor for the AuthenticationProvider that takes a ZooKeeperServer instance. If the constructor is found, it will be used. Otherwise, initialize() will look for a constructor that takes no arguments and use that instead. |
239687 | No Perforce job exists for this issue. | 12 | 2543 | 3 years, 17 weeks, 3 days ago | Plumb ZooKeeperServer object into auth plugins. | 0|i00sgn: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1524 | use more standard junit annotation "@Before" in SaslXTests rather than static blocks |
Improvement | Open | Minor | Unresolved | Eugene Joseph Koontz | Eugene Joseph Koontz | Eugene Joseph Koontz | 01/Aug/12 15:39 | 01/Aug/12 15:56 | tests | 0 | 1 | ZOOKEEPER-1503 | The following tests: AuthTest.java SaslAuthFailTest.java SaslAuthDesignatedClientTest.java SaslAuthFailDesignatedClientTest.java SaslAuthMissingClientConfigTest.java SaslAuthTest.java use "static {..}" blocks to initialize system properties and files prior to the test runs. As Patrick points out in ZOOKEEPER-1503, we should instead use JUnit's @Before annotation: http://junit.sourceforge.net/javadoc/org/junit/Before.html rather than static blocks, to make our tests more consistent and easier to understand. |
242185 | No Perforce job exists for this issue. | 0 | 12732 | 7 years, 34 weeks, 1 day ago | 0|i02jbr: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1523 | Better logging during instance loading/syncing |
Improvement | Open | Critical | Unresolved | Unassigned | Jordan Zimmerman | Jordan Zimmerman | 31/Jul/12 17:26 | 09/Jul/19 16:00 | 3.3.5 | quorum, server | 0 | 6 | 0 | 9000 | When an instance is coming up and loading from snapshot, better logging is needed so an operator knows how long until completion. Also, when syncing with the leader, better logging is needed to know how long until success. | 100% | 100% | 9000 | 0 | pull-request-available | 242186 | No Perforce job exists for this issue. | 0 | 12733 | 36 weeks, 2 days ago | 0|i02jbz: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1522 | intermittent failures in Zab test due to NPE in recursiveDelete test function |
Bug | Resolved | Major | Fixed | Patrick D. Hunt | Patrick D. Hunt | Patrick D. Hunt | 30/Jul/12 18:27 | 01/Aug/12 17:12 | 01/Aug/12 11:57 | 3.4.3, 3.5.0 | 3.4.4, 3.5.0 | tests | 0 | 3 | The jdk7 test job on jenkins is failing intermittently with {noformat} java.lang.NullPointerException at org.apache.zookeeper.server.quorum.Zab1_0Test.recursiveDelete(Zab1_0Test.java:917) at org.apache.zookeeper.server.quorum.Zab1_0Test.recursiveDelete(Zab1_0Test.java:918) at org.apache.zookeeper.server.quorum.Zab1_0Test.recursiveDelete(Zab1_0Test.java:918) at org.apache.zookeeper.server.quorum.Zab1_0Test.testPopulatedLeaderConversation(Zab1_0Test.java:419) at org.apache.zookeeper.server.quorum.Zab1_0Test.testUnnecessarySnap(Zab1_0Test.java:483) {noformat} Seems to not be handling the case where the file is deleted out from under. Also the recursive deletes should be at the very end of the finally I would think. |
242016 | No Perforce job exists for this issue. | 1 | 12501 | 7 years, 34 weeks, 1 day ago |
Reviewed
|
0|i02hwf: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1521 | LearnerHandler initLimit/syncLimit problems specifying follower socket timeout limits |
Bug | Resolved | Critical | Fixed | Patrick D. Hunt | Patrick D. Hunt | Patrick D. Hunt | 26/Jul/12 11:44 | 29/Jul/12 07:02 | 29/Jul/12 01:08 | 3.4.3, 3.3.5, 3.5.0 | 3.3.6, 3.4.4, 3.5.0 | server | 0 | 9 | branch 3.3: The leader is expecting the follower to initialize in syncLimit time rather than initLimit. In LearnerHandler run line 395 (branch33) we look for the ack from the follower with a timeout of syncLimit. branch 3.4+: seems like ZOOKEEPER-1136 introduced a regression while attempting to fix the problem. It sets the timeout as initLimit however it never sets the timeout to syncLimit once the ack is received. |
242017 | No Perforce job exists for this issue. | 3 | 12504 | 7 years, 34 weeks, 4 days ago |
Reviewed
|
0|i02hx3: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1520 | A txn log record with a corrupt sentinel byte looks like EOF |
Bug | Open | Minor | Unresolved | Bill Bridge | Bill Bridge | Bill Bridge | 25/Jul/12 14:42 | 05/Feb/20 07:16 | 3.3.5 | 3.7.0, 3.5.8 | server | 1 | 4 | 86400 | 86400 | 0% | all | In Util.readTxnBytes() the sentinel is compared with 0x42 and if it does not match then the record is considered partially written and thus the EOF. However if it is a partial record the sentinel should be 0x00 since that is what the log is initialized with. Any other value would indicate corruption and should throw an IOException rather than indicate EOF. See [ZOOKEEPER-1453|https://issues.apache.org/jira/browse/ZOOKEEPER-1453] for a related issue. | 0% | 0% | 86400 | 86400 | newbie, patch | 239662 | No Perforce job exists for this issue. | 5 | 2495 | 5 years, 51 weeks, 2 days ago | 0|i00s5z: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1519 | Zookeeper Async calls can reference free()'d memory |
Bug | Open | Major | Unresolved | Daniel Lescohier | Mark Gius | Mark Gius | 25/Jul/12 13:34 | 05/Feb/20 07:16 | 3.3.3, 3.3.6 | 3.7.0, 3.5.8 | c client | 0 | 8 | Ubuntu 11.10, Ubuntu packaged Zookeeper 3.3.3 with some backported fixes. | zoo_acreate() and zoo_aset() take a char * argument for data and prepare a call to zookeeper. This char * doesn't seem to be duplicated at any point, making it possible that the caller of the asynchronous function might potentially free() the char * argument before the zookeeper library completes its request. This is unlikely to present a real problem unless the freed memory is re-used before zookeeper consumes it. I've been unable to reproduce this issue using pure C as a result. However, ZKPython is a whole different story. Consider this snippet: ok = zookeeper.acreate(handle, path, json.dumps(value), acl, flags, callback) assert ok == zookeeper.OK In this snippet, json.dumps() allocates a string which is passed into the acreate(). When acreate() returns, the zookeeper request has been constructed with a pointer to the string allocated by json.dumps(). Also when acreate() returns, that string is now referenced by 0 things (ZKPython doesn't bump the refcount) and the string is eligible for garbage collection and re-use. The Zookeeper request now has a pointer to dangerous freed memory. I've been seeing odd behavior in our development environments for some time now, where it appeared as though two separate JSON payloads had been joined together. Python has been allocating a new JSON string in the middle of the old string that an incomplete zookeeper async call had not yet processed. I am not sure if this is a behavior that should be documented, or if the C binding implementation needs to be updated to create copies of the data payload provided for aset and acreate. |
242187 | No Perforce job exists for this issue. | 1 | 12734 | 6 years, 19 weeks ago | 0|i02jc7: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1518 | Mailing List link is broken in the Zookeeper documentation |
Bug | Resolved | Major | Fixed | Patrick D. Hunt | Kiran BC | Kiran BC | 25/Jul/12 05:28 | 01/Aug/12 15:09 | 01/Aug/12 15:09 | 3.4.3 | documentation | 0 | 2 | Mailing List link under Miscellaneous section from the Zookeeper documentation is broken. Following is the link: http://zookeeper.apache.org/mailing_lists.html |
242188 | No Perforce job exists for this issue. | 0 | 12735 | 7 years, 34 weeks, 1 day ago | 0|i02jcf: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1517 | zookeeper follower closed |
Bug | Resolved | Major | Invalid | Unassigned | liuli | liuli | 24/Jul/12 08:52 | 02/Aug/12 20:30 | 02/Aug/12 20:17 | 3.3.5 | 3.3.5 | quorum | 0 | 1 | zookeeper version 3.3.5 Hadoop version 0.20.205.0 |
I have Hadoop and Zookeeper installed the zoo.cfg is : tickTime=2000 dataDir=/home/hduser/zookeeper/conf clientPort=2181 initLimit=10 syncLimit=5 server.1=rsmm-master:2888:3888 server.2=rsmm-slave-1:2888:3888 server.3=rsmm-slave-2:2888:3888 server.4=rsmm-slave-3:2888:3888 server.5=rsmm-slave-4:2888:3888 ===================================== I tried to start zookeeper, ./zkServer.sh start ./zkServer.sh status JMX enabled by default Using config: /home/hduser/zookeeper/bin/../conf/zoo.cfg Mode: follower The follower (rsmm-slave-4) logs complain: 012-07-24 20:29:35,903 - WARN [Thread-9:QuorumCnxManager$RecvWorker@727] - Connection broken for id 5, my id = 2, error = java.io.IOException: Channel eof 2012-07-24 20:29:35,904 - WARN [Thread-9:QuorumCnxManager$RecvWorker@730] - Interrupting SendWorker 2012-07-24 20:29:35,905 - WARN [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Follower@82] - Exception when following the leader java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:392) at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:84) at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108) at org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:148) at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:78) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:645) 2012-07-24 20:29:35,905 - WARN [Thread-8:QuorumCnxManager$SendWorker@633] - Interrupted while waiting for message on queue java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2017) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2094) at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:370) at org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:622) 2012-07-24 20:29:35,907 - WARN [Thread-8:QuorumCnxManager$SendWorker@642] - Send worker leaving thread 2012-07-24 20:29:35,907 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Follower@165] - shutdown called java.lang.Exception: shutdown Follower at org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:165) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:649) 2012-07-24 20:29:35,913 - INFO [FollowerRequestProcessor:2:FollowerRequestProcessor@93] - FollowerRequestProcessor exited loop! 2012-07-24 20:29:35,914 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:FinalRequestProcessor@370] - shutdown of request processor complete 2012-07-24 20:29:35,914 - INFO [CommitProcessor:2:CommitProcessor@148] - CommitProcessor exited loop! 2012-07-24 20:29:35,915 - INFO [SyncThread:2:SyncRequestProcessor@151] - SyncRequestProcessor exited! 2012-07-24 20:29:35,916 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 1 (n.leader), 4294967296 (n.zxid), 2 (n.round), LOOKING (n.state), 1 (n.sid), FOLLOWING (my state) 2012-07-24 20:29:35,916 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:QuorumPeer@621] - LOOKING 2012-07-24 20:29:35,918 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:FileSnap@82] - Reading snapshot /home/hduser/zookeeper/conf/version-2/snapshot.100000000 2012-07-24 20:29:35,919 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:FastLeaderElection@663] - New election. My id = 2, Proposed zxid = 4294967296 2012-07-24 20:29:35,919 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 2 (n.leader), 4294967296 (n.zxid), 2 (n.round), LOOKING (n.state), 2 (n.sid), LOOKING (my state) 2012-07-24 20:29:35,920 - WARN [WorkerSender Thread:QuorumCnxManager@384] - Cannot open channel to 5 at election address rsmm-slave-4/109.123.121.27:3888 java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:592) at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:118) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:371) at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:340) at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:360) at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:333) at java.lang.Thread.run(Thread.java:679) 2012-07-24 20:29:35,920 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 3 (n.leader), 0 (n.zxid), 2 (n.round), LOOKING (n.state), 3 (n.sid), LOOKING (my state) 2012-07-24 20:29:35,922 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 1 (n.leader), 4294967296 (n.zxid), 2 (n.round), LOOKING (n.state), 3 (n.sid), LOOKING (my state) 2012-07-24 20:29:35,926 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 4 (n.leader), 0 (n.zxid), 2 (n.round), LOOKING (n.state), 4 (n.sid), LOOKING (my state) 2012-07-24 20:29:35,928 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 2 (n.leader), 4294967296 (n.zxid), 2 (n.round), LOOKING (n.state), 4 (n.sid), LOOKING (my state) 2012-07-24 20:29:35,932 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 2 (n.leader), 4294967296 (n.zxid), 2 (n.round), LOOKING (n.state), 1 (n.sid), LOOKING (my state) 2012-07-24 20:29:35,936 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 2 (n.leader), 4294967296 (n.zxid), 2 (n.round), LOOKING (n.state), 3 (n.sid), LOOKING (my state) 2012-07-24 20:29:36,137 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:QuorumPeer@655] - LEADING 2012-07-24 20:29:36,141 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Leader@55] - TCP NoDelay set to: true 2012-07-24 20:29:36,143 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:ZooKeeperServer@154] - Created server with tickTime 2000 minSessionTimeout 4000 maxSessionTimeout 40000 datadir /home/hduser/zookeeper/conf/version-2 snapdir /home/hduser/zookeeper/conf/version-2 2012-07-24 20:29:36,147 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:FileSnap@82] - Reading snapshot /home/hduser/zookeeper/conf/version-2/snapshot.100000000 2012-07-24 20:29:36,148 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:FileTxnSnapLog@254] - Snapshotting: 100000000 2012-07-24 20:29:37,149 - INFO [LearnerHandler-/109.123.121.26:34087:LearnerHandler@249] - Follower sid: 4 : info : org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@1c74f37 2012-07-24 20:29:37,150 - INFO [LearnerHandler-/109.123.121.26:34087:LearnerHandler@273] - Synchronizing with Follower sid: 4 maxCommittedLog =0 minCommittedLog = 0 peerLastZxid = 0 2012-07-24 20:29:37,151 - INFO [LearnerHandler-/109.123.121.26:34087:LearnerHandler@357] - Sending snapshot last zxid of peer is 0x0 zxid of leader is 0x200000000sent zxid of db as 0x100000000 2012-07-24 20:29:37,152 - INFO [LearnerHandler-/109.123.121.23:41659:LearnerHandler@249] - Follower sid: 1 : info : org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@a17083 2012-07-24 20:29:37,153 - INFO [LearnerHandler-/109.123.121.23:41659:LearnerHandler@273] - Synchronizing with Follower sid: 1 maxCommittedLog =0 minCommittedLog = 0 peerLastZxid = 100000000 2012-07-24 20:29:37,154 - INFO [LearnerHandler-/109.123.121.23:41659:LearnerHandler@357] - Sending snapshot last zxid of peer is 0x100000000 zxid of leader is 0x200000000sent zxid of db as 0x100000000 2012-07-24 20:29:37,156 - INFO [LearnerHandler-/109.123.121.25:54707:LearnerHandler@249] - Follower sid: 3 : info : org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@16fe0f4 2012-07-24 20:29:37,156 - INFO [LearnerHandler-/109.123.121.25:54707:LearnerHandler@273] - Synchronizing with Follower sid: 3 maxCommittedLog =0 minCommittedLog = 0 peerLastZxid = 0 2012-07-24 20:29:37,157 - INFO [LearnerHandler-/109.123.121.25:54707:LearnerHandler@357] - Sending snapshot last zxid of peer is 0x0 zxid of leader is 0x200000000sent zxid of db as 0x100000000 2012-07-24 20:29:37,159 - WARN [LearnerHandler-/109.123.121.26:34087:Leader@492] - Commiting zxid 0x200000000 from /109.123.121.24:2888 not first! 2012-07-24 20:29:37,160 - WARN [LearnerHandler-/109.123.121.26:34087:Leader@494] - First is 0 2012-07-24 20:29:37,172 - INFO [LearnerHandler-/109.123.121.26:34087:Leader@518] - Have quorum of supporters; starting up and setting last processed zxid: 8589934592 2012-07-24 20:30:40,397 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 5 (n.leader), 0 (n.zxid), 1 (n.round), LOOKING (n.state), 5 (n.sid), LEADING (my state) 2012-07-24 20:30:40,397 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 1 (n.leader), 4294967296 (n.zxid), 2 (n.round), LOOKING (n.state), 5 (n.sid), LEADING (my state) 2012-07-24 20:30:40,398 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 2 (n.leader), 4294967296 (n.zxid), 2 (n.round), LOOKING (n.state), 5 (n.sid), LEADING (my state) 2012-07-24 20:30:40,400 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 2 (n.leader), 4294967296 (n.zxid), 2 (n.round), LOOKING (n.state), 5 (n.sid), LEADING (my state) 2012-07-24 20:30:40,641 - INFO [LearnerHandler-/109.123.121.27:34526:LearnerHandler@249] - Follower sid: 5 : info : org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@15663a2 2012-07-24 20:30:40,642 - INFO [LearnerHandler-/109.123.121.27:34526:LearnerHandler@273] - Synchronizing with Follower sid: 5 maxCommittedLog =0 minCommittedLog = 0 peerLastZxid = 0 2012-07-24 20:30:40,642 - INFO [LearnerHandler-/109.123.121.27:34526:LearnerHandler@357] - Sending snapshot last zxid of peer is 0x0 zxid of leader is 0x200000000sent zxid of db as 0x200000000 2012-07-24 20:30:37,768 - INFO [main:QuorumPeerConfig@90] - Reading configuration from: /home/hduser/zookeeper/bin/../conf/zoo.cfg 2012-07-24 20:30:37,774 - INFO [main:QuorumPeerConfig@310] - Defaulting to majority quorums 2012-07-24 20:30:37,792 - INFO [main:QuorumPeerMain@119] - Starting quorum peer 2012-07-24 20:30:37,820 - INFO [main:NIOServerCnxn$Factory@143] - binding to port 0.0.0.0/0.0.0.0:2181 2012-07-24 20:30:37,845 - INFO [main:QuorumPeer@819] - tickTime set to 2000 2012-07-24 20:30:37,845 - INFO [main:QuorumPeer@830] - minSessionTimeout set to -1 2012-07-24 20:30:37,846 - INFO [main:QuorumPeer@841] - maxSessionTimeout set to -1 2012-07-24 20:30:37,846 - INFO [main:QuorumPeer@856] - initLimit set to 10 2012-07-24 20:30:37,863 - INFO [main:FileSnap@82] - Reading snapshot /home/hduser/zookeeper/conf/version-2/snapshot.0 2012-07-24 20:30:37,895 - INFO [Thread-1:QuorumCnxManager$Listener@473] - My election bind port: 3888 2012-07-24 20:30:37,909 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:QuorumPeer@621] - LOOKING 2012-07-24 20:30:37,912 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:FastLeaderElection@663] - New election. My id = 5, Proposed zxid = 0 2012-07-24 20:30:37,923 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 5 (n.leader), 0 (n.zxid), 1 (n.round), LOOKING (n.state), 1 (n.sid), LOOKING (my state) 2012-07-24 20:30:37,923 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 1 (n.leader), 4294967296 (n.zxid), 2 (n.round), LOOKING (n.state), 1 (n.sid), LOOKING (my state) 2012-07-24 20:30:37,924 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 2 (n.leader), 4294967296 (n.zxid), 2 (n.round), LOOKING (n.state), 1 (n.sid), LOOKING (my state) 2012-07-24 20:30:37,924 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 2 (n.leader), 4294967296 (n.zxid), 2 (n.round), FOLLOWING (n.state), 1 (n.sid), LOOKING (my state) 2012-07-24 20:30:37,925 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:FastLeaderElection@721] - Updating proposal 2012-07-24 20:30:37,928 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 5 (n.leader), 0 (n.zxid), 1 (n.round), LOOKING (n.state), 5 (n.sid), LOOKING (my state) 2012-07-24 20:30:37,929 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 1 (n.leader), 4294967296 (n.zxid), 2 (n.round), LOOKING (n.state), 5 (n.sid), LOOKING (my state) 2012-07-24 20:30:37,929 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 2 (n.leader), 4294967296 (n.zxid), 2 (n.round), LOOKING (n.state), 5 (n.sid), LOOKING (my state) 2012-07-24 20:30:37,931 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 2 (n.leader), 4294967296 (n.zxid), 1 (n.round), LOOKING (n.state), 2 (n.sid), LOOKING (my state) 2012-07-24 20:30:37,932 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 2 (n.leader), 4294967296 (n.zxid), 2 (n.round), LOOKING (n.state), 2 (n.sid), LOOKING (my state) 2012-07-24 20:30:37,932 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 2 (n.leader), 4294967296 (n.zxid), 2 (n.round), LEADING (n.state), 2 (n.sid), LOOKING (my state) 2012-07-24 20:30:37,933 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 2 (n.leader), 4294967296 (n.zxid), 2 (n.round), FOLLOWING (n.state), 1 (n.sid), LOOKING (my state) 2012-07-24 20:30:37,933 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 5 (n.leader), 0 (n.zxid), 1 (n.round), LOOKING (n.state), 3 (n.sid), LOOKING (my state) 2012-07-24 20:30:37,934 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 2 (n.leader), 4294967296 (n.zxid), 2 (n.round), LOOKING (n.state), 5 (n.sid), LOOKING (my state) 2012-07-24 20:30:37,935 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 2 (n.leader), 4294967296 (n.zxid), 2 (n.round), FOLLOWING (n.state), 1 (n.sid), LOOKING (my state) 2012-07-24 20:30:37,935 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 3 (n.leader), 0 (n.zxid), 2 (n.round), LOOKING (n.state), 3 (n.sid), LOOKING (my state) 2012-07-24 20:30:37,936 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 1 (n.leader), 4294967296 (n.zxid), 2 (n.round), LOOKING (n.state), 3 (n.sid), LOOKING (my state) 2012-07-24 20:30:37,937 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 2 (n.leader), 4294967296 (n.zxid), 2 (n.round), LEADING (n.state), 2 (n.sid), LOOKING (my state) 2012-07-24 20:30:37,937 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 2 (n.leader), 4294967296 (n.zxid), 2 (n.round), LOOKING (n.state), 3 (n.sid), LOOKING (my state) 2012-07-24 20:30:37,938 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 2 (n.leader), 4294967296 (n.zxid), 2 (n.round), FOLLOWING (n.state), 3 (n.sid), LOOKING (my state) 2012-07-24 20:30:37,938 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 2 (n.leader), 4294967296 (n.zxid), 2 (n.round), LEADING (n.state), 2 (n.sid), LOOKING (my state) 2012-07-24 20:30:37,938 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 2 (n.leader), 4294967296 (n.zxid), 2 (n.round), FOLLOWING (n.state), 3 (n.sid), LOOKING (my state) 2012-07-24 20:30:37,939 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 2 (n.leader), 4294967296 (n.zxid), 2 (n.round), FOLLOWING (n.state), 3 (n.sid), LOOKING (my state) 2012-07-24 20:30:37,939 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 2 (n.leader), 4294967296 (n.zxid), 2 (n.round), LEADING (n.state), 2 (n.sid), LOOKING (my state) 2012-07-24 20:30:37,939 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 5 (n.leader), 0 (n.zxid), 1 (n.round), LOOKING (n.state), 4 (n.sid), LOOKING (my state) 2012-07-24 20:30:37,940 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 4 (n.leader), 0 (n.zxid), 2 (n.round), LOOKING (n.state), 4 (n.sid), LOOKING (my state) 2012-07-24 20:30:37,941 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 2 (n.leader), 4294967296 (n.zxid), 2 (n.round), FOLLOWING (n.state), 3 (n.sid), LOOKING (my state) 2012-07-24 20:30:37,941 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 2 (n.leader), 4294967296 (n.zxid), 2 (n.round), LOOKING (n.state), 4 (n.sid), LOOKING (my state) 2012-07-24 20:30:37,941 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 2 (n.leader), 4294967296 (n.zxid), 2 (n.round), FOLLOWING (n.state), 4 (n.sid), LOOKING (my state) 2012-07-24 20:30:37,942 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 2 (n.leader), 4294967296 (n.zxid), 2 (n.round), FOLLOWING (n.state), 4 (n.sid), LOOKING (my state) 2012-07-24 20:30:37,942 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 2 (n.leader), 4294967296 (n.zxid), 2 (n.round), FOLLOWING (n.state), 4 (n.sid), LOOKING (my state) 2012-07-24 20:30:37,942 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 2 (n.leader), 4294967296 (n.zxid), 2 (n.round), FOLLOWING (n.state), 4 (n.sid), LOOKING (my state) 2012-07-24 20:30:38,143 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:QuorumPeer@643] - FOLLOWING 2012-07-24 20:30:38,150 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Learner@80] - TCP NoDelay set to: true 2012-07-24 20:30:38,157 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Environment@97] - Server environment:zookeeper.version=3.3.5-1301095, built on 03/15/2012 19:48 GMT 2012-07-24 20:30:38,157 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Environment@97] - Server environment:host.name=rsmm-slave-4 2012-07-24 20:30:38,158 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Environment@97] - Server environment:java.version=1.6.0_23 2012-07-24 20:30:38,158 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Environment@97] - Server environment:java.vendor=Sun Microsystems Inc. 2012-07-24 20:30:38,158 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Environment@97] - Server environment:java.home=/usr/lib/jvm/java-6-openjdk/jre 2012-07-24 20:30:38,159 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Environment@97] - Server environment:java.class.path=/home/hduser/zookeeper/bin/../build/classes:/home/hduser/zookeeper/bin/../build/lib/*.jar:/home/hduser/zookeeper/bin/../zookeeper-3.3.5.jar:/home/hduser/zookeeper/bin/../lib/log4j-1.2.15.jar:/home/hduser/zookeeper/bin/../lib/jline-0.9.94.jar:/home/hduser/zookeeper/bin/../src/java/lib/*.jar:/home/hduser/zookeeper/bin/../conf: 2012-07-24 20:30:38,159 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Environment@97] - Server environment:java.library.path=/usr/lib/jvm/java-6-openjdk/jre/lib/i386/client:/usr/lib/jvm/java-6-openjdk/jre/lib/i386:/usr/lib/jvm/java-6-openjdk/jre/../lib/i386:/usr/java/packages/lib/i386:/usr/lib/jni:/lib:/usr/lib 2012-07-24 20:30:38,159 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Environment@97] - Server environment:java.io.tmpdir=/tmp 2012-07-24 20:30:38,159 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Environment@97] - Server environment:java.compiler=<NA> 2012-07-24 20:30:38,160 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Environment@97] - Server environment:os.name=Linux 2012-07-24 20:30:38,160 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Environment@97] - Server environment:os.arch=i386 2012-07-24 20:30:38,160 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Environment@97] - Server environment:os.version=3.0.0-12-generic 2012-07-24 20:30:38,160 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Environment@97] - Server environment:user.name=hduser 2012-07-24 20:30:38,160 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Environment@97] - Server environment:user.home=/home/hduser 2012-07-24 20:30:38,161 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Environment@97] - Server environment:user.dir=/home/hduser/zookeeper/bin 2012-07-24 20:30:38,162 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:ZooKeeperServer@154] - Created server with tickTime 2000 minSessionTimeout 4000 maxSessionTimeout 40000 datadir /home/hduser/zookeeper/conf/version-2 snapdir /home/hduser/zookeeper/conf/version-2 2012-07-24 20:30:38,175 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Learner@294] - Getting a snapshot from leader 2012-07-24 20:30:38,179 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Learner@326] - Setting leader epoch 2 2012-07-24 20:30:38,180 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:FileTxnSnapLog@254] - Snapshotting: 200000000 2012-07-24 20:30:46,564 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn$Factory@251] - Accepted socket connection from /127.0.0.1:41116 2012-07-24 20:30:46,569 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1237] - Processing srvr command from /127.0.0.1:41116 2012-07-24 20:30:46,573 - INFO [Thread-10:NIOServerCnxn@1435] - Closed socket connection for client /127.0.0.1:41116 (no session established for client) 2012-07-24 20:33:27,407 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn$Factory@251] - Accepted socket connection from /127.0.0.1:41118 2012-07-24 20:33:27,408 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1237] - Processing srvr command from /127.0.0.1:41118 2012-07-24 20:33:27,411 - INFO [Thread-11:NIOServerCnxn@1435] - Closed socket connection for client /127.0.0.1:41118 (no session established for client) 2012-07-24 20:47:21,659 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn$Factory@251] - Accepted socket connection from /127.0.0.1:41126 2012-07-24 20:47:21,660 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1237] - Processing srvr command from /127.0.0.1:41126 2012-07-24 20:47:21,663 - INFO [Thread-12:NIOServerCnxn@1435] - Closed socket connection for client /127.0.0.1:41126 (no session established for client) ================================== while the leader 's log shows 2012-07-24 20:22:33,769 - INFO [main:QuorumPeerConfig@90] - Reading configuration from: /home/hduser/zookeeper/bin/../conf/zoo.cfg 2012-07-24 20:22:33,776 - INFO [main:QuorumPeerConfig@310] - Defaulting to majority quorums 2012-07-24 20:22:33,795 - INFO [main:QuorumPeerMain@119] - Starting quorum peer 2012-07-24 20:22:33,827 - INFO [main:NIOServerCnxn$Factory@143] - binding to port 0.0.0.0/0.0.0.0:2181 2012-07-24 20:22:33,854 - INFO [main:QuorumPeer@819] - tickTime set to 2000 2012-07-24 20:22:33,854 - INFO [main:QuorumPeer@830] - minSessionTimeout set to -1 2012-07-24 20:22:33,855 - INFO [main:QuorumPeer@841] - maxSessionTimeout set to -1 2012-07-24 20:22:33,855 - INFO [main:QuorumPeer@856] - initLimit set to 10 2012-07-24 20:22:33,874 - INFO [main:FileSnap@82] - Reading snapshot /home/hduser/zookeeper/conf/version-2/snapshot.100000000 2012-07-24 20:22:33,905 - INFO [Thread-1:QuorumCnxManager$Listener@473] - My election bind port: 3888 2012-07-24 20:22:33,923 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:QuorumPeer@621] - LOOKING 2012-07-24 20:22:33,926 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:FastLeaderElection@663] - New election. My id = 2, Proposed zxid = 4294967296 2012-07-24 20:22:33,935 - INFO [WorkerSender Thread:QuorumCnxManager@183] - Have smaller server identifier, so dropping the connection: (3, 2) 2012-07-24 20:22:33,935 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 2 (n.leader), 4294967296 (n.zxid), 1 (n.round), LOOKING (n.state), 2 (n.sid), LOOKING (my state) 2012-07-24 20:22:33,936 - INFO [WorkerSender Thread:QuorumCnxManager@183] - Have smaller server identifier, so dropping the connection: (4, 2) 2012-07-24 20:22:33,937 - INFO [WorkerSender Thread:QuorumCnxManager@183] - Have smaller server identifier, so dropping the connection: (5, 2) 2012-07-24 20:22:33,938 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 5 (n.leader), 0 (n.zxid), 1 (n.round), LOOKING (n.state), 1 (n.sid), LOOKING (my state) 2012-07-24 20:22:33,939 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 5 (n.leader), 0 (n.zxid), 1 (n.round), FOLLOWING (n.state), 1 (n.sid), LOOKING (my state) 2012-07-24 20:22:33,941 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 5 (n.leader), 0 (n.zxid), 1 (n.round), FOLLOWING (n.state), 3 (n.sid), LOOKING (my state) 2012-07-24 20:22:33,941 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 5 (n.leader), 0 (n.zxid), 1 (n.round), FOLLOWING (n.state), 4 (n.sid), LOOKING (my state) 2012-07-24 20:22:33,942 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 5 (n.leader), 0 (n.zxid), 1 (n.round), FOLLOWING (n.state), 4 (n.sid), LOOKING (my state) 2012-07-24 20:22:33,943 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 5 (n.leader), 0 (n.zxid), 1 (n.round), FOLLOWING (n.state), 3 (n.sid), LOOKING (my state) 2012-07-24 20:22:33,945 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 5 (n.leader), 0 (n.zxid), 1 (n.round), LEADING (n.state), 5 (n.sid), LOOKING (my state) 2012-07-24 20:22:33,945 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 5 (n.leader), 0 (n.zxid), 1 (n.round), LEADING (n.state), 5 (n.sid), FOLLOWING (my state) 2012-07-24 20:22:33,946 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:QuorumPeer@643] - FOLLOWING 2012-07-24 20:22:33,952 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Learner@80] - TCP NoDelay set to: true 2012-07-24 20:22:33,959 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Environment@97] - Server environment:zookeeper.version=3.3.5-1301095, built on 03/15/2012 19:48 GMT 2012-07-24 20:22:33,960 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Environment@97] - Server environment:host.name=rsmm-slave-1 2012-07-24 20:22:33,960 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Environment@97] - Server environment:java.version=1.6.0_23 2012-07-24 20:22:33,960 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Environment@97] - Server environment:java.vendor=Sun Microsystems Inc. 2012-07-24 20:22:33,961 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Environment@97] - Server environment:java.home=/usr/lib/jvm/java-6-openjdk/jre 2012-07-24 20:22:33,961 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Environment@97] - Server environment:java.class.path=/home/hduser/zookeeper/bin/../build/classes:/home/hduser/zookeeper/bin/../build/lib/*.jar:/home/hduser/zookeeper/bin/../zookeeper-3.3.5.jar:/home/hduser/zookeeper/bin/../lib/log4j-1.2.15.jar:/home/hduser/zookeeper/bin/../lib/jline-0.9.94.jar:/home/hduser/zookeeper/bin/../src/java/lib/*.jar:/home/hduser/zookeeper/bin/../conf: 2012-07-24 20:22:33,961 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Environment@97] - Server environment:java.library.path=/usr/lib/jvm/java-6-openjdk/jre/lib/i386/client:/usr/lib/jvm/java-6-openjdk/jre/lib/i386:/usr/lib/jvm/java-6-openjdk/jre/../lib/i386:/usr/java/packages/lib/i386:/usr/lib/jni:/lib:/usr/lib 2012-07-24 20:22:33,961 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Environment@97] - Server environment:java.io.tmpdir=/tmp 2012-07-24 20:22:33,962 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Environment@97] - Server environment:java.compiler=<NA> 2012-07-24 20:22:33,962 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Environment@97] - Server environment:os.name=Linux 2012-07-24 20:22:33,962 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Environment@97] - Server environment:os.arch=i386 2012-07-24 20:22:33,962 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Environment@97] - Server environment:os.version=3.0.0-12-generic 2012-07-24 20:22:33,962 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Environment@97] - Server environment:user.name=hduser 2012-07-24 20:22:33,963 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Environment@97] - Server environment:user.home=/home/hduser 2012-07-24 20:22:33,963 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Environment@97] - Server environment:user.dir=/home/hduser/zookeeper/bin 2012-07-24 20:22:33,965 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:ZooKeeperServer@154] - Created server with tickTime 2000 minSessionTimeout 4000 maxSessionTimeout 40000 datadir /home/hduser/zookeeper/conf/version-2 snapdir /home/hduser/zookeeper/conf/version-2 2012-07-24 20:22:33,977 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Learner@291] - Getting a diff from the leader 0x100000000 2012-07-24 20:22:33,981 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Learner@326] - Setting leader epoch 1 2012-07-24 20:22:33,983 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:FileTxnSnapLog@254] - Snapshotting: 100000000 2012-07-24 20:22:40,102 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn$Factory@251] - Accepted socket connection from /127.0.0.1:41400 2012-07-24 20:22:40,106 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1237] - Processing srvr command from /127.0.0.1:41400 2012-07-24 20:22:40,109 - INFO [Thread-10:NIOServerCnxn@1435] - Closed socket connection for client /127.0.0.1:41400 (no session established for client) 2012-07-24 20:29:35,903 - WARN [Thread-9:QuorumCnxManager$RecvWorker@727] - Connection broken for id 5, my id = 2, error = java.io.IOException: Channel eof 2012-07-24 20:29:35,904 - WARN [Thread-9:QuorumCnxManager$RecvWorker@730] - Interrupting SendWorker 2012-07-24 20:29:35,905 - WARN [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Follower@82] - Exception when following the leader java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:392) at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:84) at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108) at org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:148) at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:78) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:645) 2012-07-24 20:29:35,905 - WARN [Thread-8:QuorumCnxManager$SendWorker@633] - Interrupted while waiting for message on queue java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2017) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2094) at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:370) at org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:622) 2012-07-24 20:29:35,907 - WARN [Thread-8:QuorumCnxManager$SendWorker@642] - Send worker leaving thread 2012-07-24 20:29:35,907 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Follower@165] - shutdown called java.lang.Exception: shutdown Follower at org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:165) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:649) 2012-07-24 20:29:35,913 - INFO [FollowerRequestProcessor:2:FollowerRequestProcessor@93] - FollowerRequestProcessor exited loop! 2012-07-24 20:29:35,914 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:FinalRequestProcessor@370] - shutdown of request processor complete 2012-07-24 20:29:35,914 - INFO [CommitProcessor:2:CommitProcessor@148] - CommitProcessor exited loop! 2012-07-24 20:29:35,915 - INFO [SyncThread:2:SyncRequestProcessor@151] - SyncRequestProcessor exited! 2012-07-24 20:29:35,916 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 1 (n.leader), 4294967296 (n.zxid), 2 (n.round), LOOKING (n.state), 1 (n.sid), FOLLOWING (my state) 2012-07-24 20:29:35,916 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:QuorumPeer@621] - LOOKING 2012-07-24 20:29:35,918 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:FileSnap@82] - Reading snapshot /home/hduser/zookeeper/conf/version-2/snapshot.100000000 2012-07-24 20:29:35,919 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:FastLeaderElection@663] - New election. My id = 2, Proposed zxid = 4294967296 2012-07-24 20:29:35,919 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 2 (n.leader), 4294967296 (n.zxid), 2 (n.round), LOOKING (n.state), 2 (n.sid), LOOKING (my state) 2012-07-24 20:29:35,920 - WARN [WorkerSender Thread:QuorumCnxManager@384] - Cannot open channel to 5 at election address rsmm-slave-4/109.123.121.27:3888 java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:592) at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:118) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:371) at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:340) at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:360) at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:333) at java.lang.Thread.run(Thread.java:679) 2012-07-24 20:29:35,920 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 3 (n.leader), 0 (n.zxid), 2 (n.round), LOOKING (n.state), 3 (n.sid), LOOKING (my state) 2012-07-24 20:29:35,922 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 1 (n.leader), 4294967296 (n.zxid), 2 (n.round), LOOKING (n.state), 3 (n.sid), LOOKING (my state) 2012-07-24 20:29:35,926 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 4 (n.leader), 0 (n.zxid), 2 (n.round), LOOKING (n.state), 4 (n.sid), LOOKING (my state) 2012-07-24 20:29:35,928 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 2 (n.leader), 4294967296 (n.zxid), 2 (n.round), LOOKING (n.state), 4 (n.sid), LOOKING (my state) 2012-07-24 20:29:35,932 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 2 (n.leader), 4294967296 (n.zxid), 2 (n.round), LOOKING (n.state), 1 (n.sid), LOOKING (my state) 2012-07-24 20:29:35,936 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 2 (n.leader), 4294967296 (n.zxid), 2 (n.round), LOOKING (n.state), 3 (n.sid), LOOKING (my state) 2012-07-24 20:29:36,137 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:QuorumPeer@655] - LEADING 2012-07-24 20:29:36,141 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Leader@55] - TCP NoDelay set to: true 2012-07-24 20:29:36,143 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:ZooKeeperServer@154] - Created server with tickTime 2000 minSessionTimeout 4000 maxSessionTimeout 40000 datadir /home/hduser/zookeeper/conf/version-2 snapdir /home/hduser/zookeeper/conf/version-2 2012-07-24 20:29:36,147 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:FileSnap@82] - Reading snapshot /home/hduser/zookeeper/conf/version-2/snapshot.100000000 2012-07-24 20:29:36,148 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:FileTxnSnapLog@254] - Snapshotting: 100000000 2012-07-24 20:29:37,149 - INFO [LearnerHandler-/109.123.121.26:34087:LearnerHandler@249] - Follower sid: 4 : info : org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@1c74f37 2012-07-24 20:29:37,150 - INFO [LearnerHandler-/109.123.121.26:34087:LearnerHandler@273] - Synchronizing with Follower sid: 4 maxCommittedLog =0 minCommittedLog = 0 peerLastZxid = 0 2012-07-24 20:29:37,151 - INFO [LearnerHandler-/109.123.121.26:34087:LearnerHandler@357] - Sending snapshot last zxid of peer is 0x0 zxid of leader is 0x200000000sent zxid of db as 0x100000000 2012-07-24 20:29:37,152 - INFO [LearnerHandler-/109.123.121.23:41659:LearnerHandler@249] - Follower sid: 1 : info : org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@a17083 2012-07-24 20:29:37,153 - INFO [LearnerHandler-/109.123.121.23:41659:LearnerHandler@273] - Synchronizing with Follower sid: 1 maxCommittedLog =0 minCommittedLog = 0 peerLastZxid = 100000000 2012-07-24 20:29:37,154 - INFO [LearnerHandler-/109.123.121.23:41659:LearnerHandler@357] - Sending snapshot last zxid of peer is 0x100000000 zxid of leader is 0x200000000sent zxid of db as 0x100000000 2012-07-24 20:29:37,156 - INFO [LearnerHandler-/109.123.121.25:54707:LearnerHandler@249] - Follower sid: 3 : info : org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@16fe0f4 2012-07-24 20:29:37,156 - INFO [LearnerHandler-/109.123.121.25:54707:LearnerHandler@273] - Synchronizing with Follower sid: 3 maxCommittedLog =0 minCommittedLog = 0 peerLastZxid = 0 2012-07-24 20:29:37,157 - INFO [LearnerHandler-/109.123.121.25:54707:LearnerHandler@357] - Sending snapshot last zxid of peer is 0x0 zxid of leader is 0x200000000sent zxid of db as 0x100000000 2012-07-24 20:29:37,159 - WARN [LearnerHandler-/109.123.121.26:34087:Leader@492] - Commiting zxid 0x200000000 from /109.123.121.24:2888 not first! 2012-07-24 20:29:37,160 - WARN [LearnerHandler-/109.123.121.26:34087:Leader@494] - First is 0 2012-07-24 20:29:37,172 - INFO [LearnerHandler-/109.123.121.26:34087:Leader@518] - Have quorum of supporters; starting up and setting last processed zxid: 8589934592 2012-07-24 20:30:40,397 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 5 (n.leader), 0 (n.zxid), 1 (n.round), LOOKING (n.state), 5 (n.sid), LEADING (my state) 2012-07-24 20:30:40,397 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 1 (n.leader), 4294967296 (n.zxid), 2 (n.round), LOOKING (n.state), 5 (n.sid), LEADING (my state) 2012-07-24 20:30:40,398 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 2 (n.leader), 4294967296 (n.zxid), 2 (n.round), LOOKING (n.state), 5 (n.sid), LEADING (my state) 2012-07-24 20:30:40,400 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 2 (n.leader), 4294967296 (n.zxid), 2 (n.round), LOOKING (n.state), 5 (n.sid), LEADING (my state) 2012-07-24 20:30:40,641 - INFO [LearnerHandler-/109.123.121.27:34526:LearnerHandler@249] - Follower sid: 5 : info : org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@15663a2 2012-07-24 20:30:40,642 - INFO [LearnerHandler-/109.123.121.27:34526:LearnerHandler@273] - Synchronizing with Follower sid: 5 maxCommittedLog =0 minCommittedLog = 0 peerLastZxid = 0 2012-07-24 20:30:40,642 - INFO [LearnerHandler-/109.123.121.27:34526:LearnerHandler@357] - Sending snapshot last zxid of peer is 0x0 zxid of leader is 0x200000000sent zxid of db as 0x200000000 2012-07-24 20:49:19,788 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn$Factory@251] - Accepted socket connection from /127.0.0.1:41403 2012-07-24 20:49:19,789 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1237] - Processing srvr command from /127.0.0.1:41403 2012-07-24 20:49:19,791 - INFO [Thread-18:NIOServerCnxn@1435] - Closed socket connection for client /127.0.0.1:41403 (no session established for client) |
242189 | No Perforce job exists for this issue. | 0 | 12736 | 7 years, 34 weeks ago | I will close this one since zookeeper start normally | 0|i02jcn: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1516 | Configurable finalizeWait for FastLeaderElection |
Improvement | Open | Major | Unresolved | Unassigned | Ivan Babrou | Ivan Babrou | 24/Jul/12 07:21 | 25/Jul/12 01:50 | 3.3.5 | leaderElection, quorum, server | 1 | 3 | Gentoo linux, any environment is affected. | FastLeaderElection has final static int finalizeWait = 200. This is time to wait after successful leader election. I don't know what could happen, but 200ms is too slow for production environment under heavy load. I changed it to 20ms and everything still works for me. I propose to make this value configurable with default value of 200 to not affect current installations. Combined with #ZOOKEEPER-1515 it could improve leader election and make it 10x times faster: 1500ms -> 180ms including 100ms for 2 faileed new leader connections. |
performance | 242190 | No Perforce job exists for this issue. | 0 | 12737 | 7 years, 35 weeks, 1 day ago | 0|i02jcv: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1515 | Long reconnect timeout if leader failed. |
Improvement | Open | Major | Unresolved | Unassigned | Ivan Babrou | Ivan Babrou | 24/Jul/12 01:58 | 25/Jul/12 01:45 | 3.3.5 | leaderElection, quorum, server | 1 | 4 | Gentoo linux, but every environment is affected. | In zookeeper 3.3.5 in file src/java/main/org/apache/zookeeper/server/quorum/Learner.java:325 you may see Thread.sleep(1000); This is always happens after leader failure or restart. Zookeeper reelects new leader and all followers try to connect to it. But first attempt always fails because of "Connection refused": {quote} 2012-07-23 18:55:48,159 - WARN [QuorumPeer:/0.0.0.0:2181:Learner@229] - Unexpected exception, tries=0, connecting to web329.local/192.168.1.74:2888 java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:351) at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:213) at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:200) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366) at java.net.Socket.connect(Socket.java:529) at org.apache.zookeeper.server.quorum.Learner.connectToLeader(Learner.java:221) at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:65) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:645) {quote} I propose to change this line to the next code: {code:title=Learner.java|borderStyle=solid} if (tries > 0) { Thread.sleep(self.tickTime); } {code} This way first reconnect attempt will be done immediately, other will wait for tick time (this is good semantic change, I suppose). The result of this change - leader reelection time lowered from >1500ms to 300-400ms with 50ms tick time. This is pretty important for our production environment and will not break any existing installations. |
patch, performance | 242191 | No Perforce job exists for this issue. | 0 | 12738 | 7 years, 35 weeks, 1 day ago | 0|i02jd3: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1514 | FastLeaderElection - leader ignores the round information when joining a quorum |
Bug | Resolved | Critical | Fixed | Flavio Paiva Junqueira | Patrick D. Hunt | Patrick D. Hunt | 19/Jul/12 20:52 | 03/Aug/12 06:55 | 02/Aug/12 18:29 | 3.3.4 | 3.4.4, 3.5.0 | quorum | 0 | 6 | In the following case we have a 3 server ensemble. Initially all is well, zk3 is the leader. However zk3 fails, restarts, and rejoins the quorum as the new leader (was the old leader, still the leader after re-election) The existing two followers, zk1 and zk2 rejoin the new quorum again as followers of zk3. zk1 then fails, the datadirectory is deleted (so it has no state whatsoever) and restarted. However zk1 can never rejoin the quorum (even after an hour). During this time zk2 and zk3 are serving properly. Later all three servers are later restarted and properly form a functional quourm. Here are some interesting log snippets. Nothing else of interest was seen in the logs during this time: zk3. This is where it becomes the leader after failing initially (as the leader). Notice the "round" is ahead of zk1 and zk2: {noformat} 2012-07-18 17:19:35,423 - INFO [QuorumPeer:/0.0.0.0:2181:FastLeaderElection@663] - New election. My id = 3, Proposed zxid = 77309411648 2012-07-18 17:19:35,423 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 3 (n.leader), 77309411648 (n.zxid), 832 (n.round), LOOKING (n.state), 3 (n.sid), LOOKING (my state) 2012-07-18 17:19:35,424 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 3 (n.leader), 73014444480 (n.zxid), 831 (n.round), FOLLOWING (n.state), 2 (n.sid), LOOKING (my state) 2012-07-18 17:19:35,424 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 3 (n.leader), 73014444480 (n.zxid), 831 (n.round), FOLLOWING (n.state), 1 (n.sid), LOOKING (my state) 2012-07-18 17:19:35,424 - INFO [QuorumPeer:/0.0.0.0:2181:QuorumPeer@655] - LEADING {noformat} zk1 which won't come back. Notice that zk3 is reporting the round as 831, while zk2 thinks that the round is 832: {noformat} 2012-07-18 17:31:12,015 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 1 (n.leader), 77309411648 (n.zxid), 1 (n.round), LOOKING (n.state), 1 (n.sid), LOOKING (my state) 2012-07-18 17:31:12,016 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 3 (n.leader), 73014444480 (n.zxid), 831 (n.round), LEADING (n.state), 3 (n.sid), LOOKING (my state) 2012-07-18 17:31:12,017 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 3 (n.leader), 77309411648 (n.zxid), 832 (n.round), FOLLOWING (n.state), 2 (n.sid), LOOKING (my state) 2012-07-18 17:31:15,219 - INFO [QuorumPeer:/0.0.0.0:2181:FastLeaderElection@697] - Notification time out: 6400 {noformat} |
242192 | No Perforce job exists for this issue. | 4 | 12739 | 7 years, 33 weeks, 6 days ago | 0|i02jdb: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1513 | "Unreasonable length" exception while starting a server. |
Bug | Closed | Major | Fixed | Skye Wanderman-Milne | Patrick D. Hunt | Patrick D. Hunt | 19/Jul/12 20:38 | 13/Mar/14 14:16 | 12/Dec/12 01:52 | 3.3.4 | 3.4.6, 3.5.0 | server | 0 | 10 | The server is allowing a client to set data larger than the server can then later read: {noformat} 2012-07-18 14:28:12,555 - FATAL [main:QuorumPeer@400] - Unable to load database on disk java.io.IOException: Unreasonable length = 1048583 at org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:100) at org.apache.zookeeper.server.persistence.Util.readTxnBytes(Util.java:232) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:602) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.<init>(FileTxnLog.java:504) at org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341) at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:131) at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:222) at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:398) at org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:143) at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:103) at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:76) 2012-07-18 14:28:12,555 - FATAL [main:QuorumPeerMain@87] - Unexpected exception, exiting abnormally java.lang.RuntimeException: Unable to run quorum server at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:401) at org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:143) at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:103) at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:76) Caused by: java.io.IOException: Unreasonable length = 1048583 at org.apache.jute.BinaryInputArchive.readBuffer(BinaryInputArchive.java:100) at org.apache.zookeeper.server.persistence.Util.readTxnBytes(Util.java:232) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:602) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.<init>(FileTxnLog.java:504) at org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341) at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:131) at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:222) at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:398) ... 3 more {noformat} Notice the size is 0x100007 - 7 bytes beyond. The SetDataTxn contains the client data + a couple extra fields. On ingest the server is applying the jute.maxbuffer size to the data (expected) but not handling the fact that the data plus these extra fields may exceed the jute.maxbuffer check when reading from disk. Workaround was simple here: set the jute.maxbuffer size a bit higher (and fix the mis-behaving client, expectation was not that the data would grow this large). |
242193 | No Perforce job exists for this issue. | 4 | 12740 | 6 years, 2 weeks ago |
Reviewed
|
0|i02jdj: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1512 | Reduce log level of missing ZookeeperSaslClient Security Exception |
Bug | Open | Major | Unresolved | Unassigned | Micah Whitacre | Micah Whitacre | 18/Jul/12 14:26 | 23/May/14 08:00 | java client | 0 | 5 | ZOOKEEPER-1623, ZOOKEEPER-1657, ZOOKEEPER-1510 | When running the Java client you frequently get messages like the following: org.apache.zookeeper.client.ZooKeeperSaslClient SecurityException: java.lang.SecurityException: Unable to locate a login configuration occurred when trying to find JAAS configuration. In cases where we don't want this configuration enabled, the logs get spammed with this message. It's scope should lowered to debug/trace to prevent flooding logs. |
242194 | No Perforce job exists for this issue. | 0 | 12741 | 5 years, 43 weeks, 6 days ago | 0|i02jdr: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1511 | Symbolic nodes |
Wish | Open | Major | Unresolved | Unassigned | Sheetal Parade | Sheetal Parade | 17/Jul/12 14:33 | 17/Jul/12 14:33 | 0 | 1 | Zookeeper currently allows two type of nodes: EPHEMERAL and PERSISTENT. If the node or node data needs to be referenced from other nodes, entire node needs to be copied to new location. There is no relation between the original and copied nodes. Symbolic nodes like in unix directory structure would keep the nodes and node data in sync. Use case: While implementing managed clusters for micro shards strategy, client can register by creating Ephemeral nodes. The master process can then create new symbolic nodes along with other clients nodes for next set of processes to watch. If the client goes down, the ephemeral node cleans itself along with symbolic node. There could be different set of watchers on symbolic nodes which would then get notified. |
242195 | No Perforce job exists for this issue. | 0 | 12742 | 7 years, 36 weeks, 2 days ago | 0|i02jdz: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1510 | Should not log SASL errors for non-secure usage |
Improvement | Resolved | Minor | Fixed | Todd Lipcon | Todd Lipcon | Todd Lipcon | 16/Jul/12 15:45 | 25/Sep/13 12:00 | 01/Aug/12 17:41 | 3.4.3 | 3.4.4, 3.5.0 | java client | 0 | 5 | ZOOKEEPER-1512, ZOOKEEPER-1623 | Since SASL support was added, all connections with non-secure clients have started logging messages like: 2012-07-01 02:13:34,986 WARN org.apache.zookeeper.client.ZooKeeperSaslClient: SecurityException: java.lang.SecurityException: Unable to locate a login configuration occurred when trying to find JAAS configuration. 2012-07-01 02:13:34,986 INFO org.apache.zookeeper.client.ZooKeeperSaslClient: Client will not SASL-authenticate because the default JAAS configuration section 'Client' could not be found. If you are not using SASL, you may ignore this. On the other hand, if you expected SASL to work, please fix your JAAS configuration. Despite the "you may ignore this" qualifier, I've seen a lot of users confused by this message. Instead, it would be better to either log at DEBUG level, or piggy back the SASL information onto the "Opening socket connection" message (eg "Opening socket connection to X:2181. Will not use SASL because no configuration was located.") |
242014 | No Perforce job exists for this issue. | 2 | 12499 | 6 years, 26 weeks, 1 day ago |
Reviewed
|
0|i02hvz: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1509 | Please update documentation to reflect updated FreeBSD support. |
Task | Resolved | Major | Fixed | George Neville-Neil | George Neville-Neil | George Neville-Neil | 09/Jul/12 23:26 | 18/Nov/15 18:27 | 09/Oct/13 12:50 | 3.4.6, 3.5.0 | 3.5.0 | 0 | 3 | ZOOKEEPER-1996 | I noticed on this page: http://zookeeper.apache.org/doc/r3.3.3/zookeeperAdmin.html that FreeBSD was listed as being supported only as a client due to a problem with the JVM on FreeBSD. As of Friday zookeeper is fully supported on FreeBSD, using openjdk version 7 and I have created a port for it in our ports collection: http://www.freshports.org/devel/zookeeper/ The zookeeper port tracks the stable release at the moment and in the near future to track the current release, as the plain zookeeper port tracks the stable. Please update your documentation to reflect this change in support. Best, George Neville-Neil gnn@freebsd.org |
newbie | 242196 | No Perforce job exists for this issue. | 2 | 12743 | 6 years, 24 weeks ago | Update documentation to reflect full FreeBSD support. |
Reviewed
|
0|i02je7: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1508 | Reliable standalone mode through redundant databases |
New Feature | Open | Major | Unresolved | Unassigned | Bill Bridge | Bill Bridge | 09/Jul/12 19:12 | 12/Jul/12 17:54 | 0 | 3 | Single server with multiple disks or two node cluster with multiple shared disks | Currently ZooKeeper requires 3 servers to provide both reliability and availability. This is fine for large internet scale clusters, but there are lots of two node clusters that could benefit from ZooKeeper. There are also single server use cases where it is highly desirable to have ZooKeeper survive a disk failure, but availability is not as important. This feature would allow the configuration of multiple destinations for logs and snapshots. A transaction is committed when a majority of the log writes complete successfully. If one log gets an error on write, then it is taken offline until an administrator brings it online or replaces it with a new destination. ZooKeeper continues to run as long as a quorum of disks can be written. High availability can be provided with a two node cluster. When the ZooKeeper node dies, the disks are switched to the surviving node and a new ZooKeeper starts. Faster switch over can be done if there is an observer already running in the new node. |
242197 | No Perforce job exists for this issue. | 0 | 12744 | 7 years, 37 weeks ago | 0|i02jef: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1507 | review reading epoch files, improve logging |
Improvement | Open | Major | Unresolved | Unassigned | Patrick D. Hunt | Patrick D. Hunt | 06/Jul/12 13:00 | 06/Jul/12 13:00 | 3.4.3, 3.5.0 | server | 0 | 1 | When reading an epoch file we should log (error level) any problems encountered. org.apache.zookeeper.server.quorum.QuorumPeer.readLongFromFile(String) At the same time let's verify the call paths are handled properly. |
242198 | No Perforce job exists for this issue. | 0 | 12745 | 7 years, 37 weeks, 6 days ago | 0|i02jen: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1506 | Re-try DNS hostname -> IP resolution if node connection fails |
Improvement | Resolved | Blocker | Fixed | Robert P. Thille | Mike Heffner | Mike Heffner | 06/Jul/12 12:23 | 18/Jun/18 00:05 | 23/Sep/15 13:19 | 3.4.5, 3.4.6 | 3.4.7, 3.5.0, 3.6.0 | server | 29 | 49 | ZOOKEEPER-2982, ZOOKEEPER-2319, ZOOKEEPER-1846, ZOOKEEPER-2184 | Ubuntu 11.04 64-bit | In our zoo.cfg we use hostnames to identify the ZK servers that are part of an ensemble. These hostnames are configured with a low (<= 60s) TTL and the IP address they map to can and does change. Our procedure for replacing/upgrading a ZK node is to boot an entirely new instance and remap the hostname to the new instance's IP address. Our expectation is that when the original ZK node is terminated/shutdown, the remaining nodes in the ensemble would reconnect to the new instance. However, what we are noticing is that the remaining ZK nodes do not attempt to re-resolve the hostname->IP mapping for the new server. Once the original ZK node is terminated, the existing servers continue to attempt contacting it at the old IP address. It would be great if the ZK servers could try to re-resolve the hostname when attempting to connect to a lost ZK server, instead of caching the lookup indefinitely. Currently we must do a rolling restart of the ZK ensemble after swapping a node -- which at three nodes means we periodically lose quorum. The exact method we are following is to boot new instances in EC2 and attach one, of a set of three, Elastic IP address. External to EC2 this IP address remains the same and maps to whatever instance it is attached to. Internal to EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped to the internal (10.x.y.z) address of the instance it is attached to. Therefore, in our case we would like ZK to pickup the new 10.x.y.z address that the elastic IP hostname gets mapped to and reconnect appropriately. |
patch | 242199 | No Perforce job exists for this issue. | 14 | 12746 | 4 years, 24 weeks, 3 days ago | Tests pass with this patch. This patch is for the branch-3.4 branch ONLY. |
0|i02jev: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1505 | Multi-thread CommitProcessor |
Improvement | Resolved | Major | Fixed | Jay Shrauner | Jay Shrauner | Jay Shrauner | 05/Jul/12 19:30 | 22/Dec/12 15:42 | 07/Dec/12 18:38 | 3.4.3, 3.4.4, 3.5.0 | 3.5.0 | server | 1 | 9 | CommitProcessor has a single thread that both pulls requests off its queues and runs all downstream processors. This is noticeably inefficient for read-intensive workloads, which could be run concurrently. The trick is handling write transactions. I propose multi-threading this code according to the following two constraints - each session must see its requests responded to in order - all committed transactions must be handled in zxid order, across all sessions I believe these cover the only constraints we need to honor. In particular, I believe we can relax the following: - it does not matter if the read request in one session happens before or after the write request in another session With these constraints, I propose the following threads - 1 primary queue servicing/work dispatching thread - 0-N assignable worker threads, where a given session is always assigned to the same worker thread By assigning sessions always to the same worker thread (using a simple sessionId mod number of worker threads), we guarantee the first constraint-- requests we push onto the thread queue are processed in order. The way we guarantee the second constraint is we only allow a single commit transaction to be in flight at a time--the queue servicing thread blocks while a commit transaction is in flight, and when the transaction completes it clears the flag. On a 32 core machine running Linux 2.6.38, achieved best performance with 32 worker threads for a 56% +/- 5% improvement in throughput (this improvement was measured on top of that for ZOOKEEPER-1504, not in isolation). New classes introduced in this patch are: WorkerService (also in ZOOKEEPER-1504): ExecutorService wrapper that makes worker threads daemon threads and names then in an easily debuggable manner. Supports assignable threads (as used here) and non-assignable threads (as used by NIOServerCnxnFactory). |
performance, scaling | 239679 | No Perforce job exists for this issue. | 4 | 2534 | 7 years, 13 weeks, 5 days ago |
Reviewed
|
0|i00sen: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1504 | Multi-thread NIOServerCnxn |
Improvement | Resolved | Major | Fixed | Thawan Kooburat | Jay Shrauner | Jay Shrauner | 05/Jul/12 18:53 | 24/Jul/17 00:36 | 24/Jul/17 00:36 | 3.5.0, 3.5.1, 3.5.2, 3.5.3, 3.6.0 | server | 3 | 19 | ZOOKEEPER-1347, ZOOKEEPER-1620 | NIOServerCnxnFactory is single threaded, which doesn't scale well to large numbers of clients. This is particularly noticeable when thousands of clients connect. I propose multi-threading this code as follows: - 1 acceptor thread, for accepting new connections - 1-N selector threads - 0-M I/O worker threads Numbers of threads are configurable, with defaults scaling according to number of cores. Communication with the selector threads is handled via LinkedBlockingQueues, and connections are permanently assigned to a particular selector thread so that all potentially blocking SelectionKey operations can be performed solely by the selector thread. An ExecutorService is used for the worker threads. On a 32 core machine running Linux 2.6.38, achieved best performance with 4 selector threads and 64 worker threads for a 70% +/- 5% improvement in throughput. This patch incorporates and supersedes the patches for https://issues.apache.org/jira/browse/ZOOKEEPER-517 https://issues.apache.org/jira/browse/ZOOKEEPER-1444 New classes introduced in this patch are: - ExpiryQueue (from ZOOKEEPER-1444): factor out the logic from SessionTrackerImpl used to expire sessions so that the same logic can be used to expire connections - RateLogger (from ZOOKEEPER-517): rate limit error message logging, currently only used to throttle rate of logging "out of file descriptors" errors - WorkerService (also in ZOOKEEPER-1505): ExecutorService wrapper that makes worker threads daemon threads and names then in an easily debuggable manner. Supports assignable threads (as used by CommitProcessor) and non-assignable threads (as used here). |
performance | 239696 | No Perforce job exists for this issue. | 6 | 2559 | 2 years, 34 weeks, 3 days ago | There is a possibility of file descriptor leakage issue under high workload. Please upgrade to the latest version of JVM or the version that has a fix for this bug (http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7118373) |
Incompatible change
|
1 | 0|i00sk7: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1503 | remove redundant JAAS configuration code in SaslAuthTest and SaslAuthFailTest |
Improvement | Resolved | Major | Fixed | Eugene Joseph Koontz | Eugene Joseph Koontz | Eugene Joseph Koontz | 05/Jul/12 14:03 | 31/Aug/12 22:12 | 01/Aug/12 15:23 | 3.4.4, 3.5.0 | 0 | 3 | ZOOKEEPER-1524, ZOOKEEPER-1497 | In SaslAuthTest and SaslAuthFail test, we set the JAAS configuration twice with the same text string. This is confusing and redundant, since we need only set it once. | https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1120//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html | 242015 | No Perforce job exists for this issue. | 1 | 12500 | 7 years, 34 weeks, 1 day ago |
Reviewed
|
0|i02hw7: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1502 | Prevent multiple zookeeper servers from using the same data directory |
Improvement | Resolved | Major | Won't Fix | Rakesh Radhakrishnan | Will Johnson | Will Johnson | 05/Jul/12 13:33 | 31/Mar/14 19:41 | 31/Mar/14 19:41 | 3.4.3 | 3.5.0 | server | 1 | 5 | We recently ran into an issue where two zookeepers servers which were a part of two separate quorums were configured to use the same data directory. Interestingly, the zookeeper servers did not seem to complain and both seemed to work fine until one of them was restarted. Once that happened all sort of chaos ensued. I understand that this is a misconfiguration should zookeeper complain about this or do users need to protect themselves in some external fashion? Is a simple file lock enough or are there other things I should take into consideration if it’s up to me to handle? | 242200 | No Perforce job exists for this issue. | 1 | 12747 | 5 years, 51 weeks, 3 days ago | 0|i02jf3: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1501 | Nagios plugin always returns OK when it cannot connect to zookeeper |
Bug | Resolved | Major | Fixed | Brian Sutherland | Brian Sutherland | Brian Sutherland | 04/Jul/12 13:14 | 07/Sep/12 07:01 | 07/Sep/12 02:30 | 3.4.3 | 3.4.4, 3.5.0 | contrib | 0 | 4 | Returning OK under such conditions is really not good... | 242201 | No Perforce job exists for this issue. | 1 | 12748 | 7 years, 28 weeks, 6 days ago |
Reviewed
|
0|i02jfb: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1500 | Nagios check always returns OK when the critical and warning values are the same |
Bug | Open | Minor | Unresolved | Brian Sutherland | Brian Sutherland | Brian Sutherland | 04/Jul/12 13:10 | 31/Jul/12 18:37 | contrib | 0 | 1 | The plugin requires a difference between the warning and critical value for the checks to work. If the values are the same, OK is always returned. I can't figure out how to attach a file to this ticket in JIRA, so here's a minimal inline patch that at least lets the admin know it's not working: {noformat} Index: src/contrib/monitoring/check_zookeeper.py =================================================================== --- src/contrib/monitoring/check_zookeeper.py (revision 1357335) +++ src/contrib/monitoring/check_zookeeper.py (working copy) @@ -57,6 +57,10 @@ print >>sys.stderr, 'Invalid values for "warning" and "critical".' return 2 + if warning == critical: + print >>sys.stderr, '"warning" and "critical" cannot have the same value.' + return 2 + if opts.key is None: print >>sys.stderr, 'You should specify a key name.' return 2 {noformat} |
242202 | No Perforce job exists for this issue. | 0 | 12749 | 7 years, 34 weeks, 2 days ago | 0|i02jfj: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1499 | clientPort config changes not backwards-compatible |
Bug | Resolved | Blocker | Fixed | Alexander Shraer | Camille Fournier | Camille Fournier | 03/Jul/12 18:33 | 24/Oct/13 07:08 | 24/Oct/13 01:21 | 3.5.0 | 3.5.0 | server | 0 | 5 | With the new reconfig logic, clientPort=2181 in the zoo.cfg file no longer gets read, and clients can't connect without adding ;2181 to the end of their server lines. | 242203 | No Perforce job exists for this issue. | 4 | 12750 | 6 years, 22 weeks ago |
Reviewed
|
0|i02jfr: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1498 | Zab1.0 sends NEWLEADER packet twice |
Bug | Resolved | Minor | Duplicate | Unassigned | Camille Fournier | Camille Fournier | 03/Jul/12 17:25 | 03/Jan/13 21:29 | 03/Jan/13 21:29 | 3.4.3, 3.5.0 | server | 0 | 3 | In pre-Zab1.0, we would process the NEWLEADER packet in registerWithLeader. Now we only process it in syncWithLeader, and in certain circumstances (the first follower of a new leader) it seems like we get 2 of them, which causes 2 snapshots to be taken one right after another. Not sure whether we should ignore taking the snapshot the second time, or not send two packets, or what. | 242204 | No Perforce job exists for this issue. | 0 | 12751 | 7 years, 11 weeks, 6 days ago | 0|i02jfz: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1497 | Allow server-side SASL login with JAAS configuration to be programmatically set (rather than only by reading JAAS configuration file) |
Improvement | Resolved | Major | Fixed | Matteo Bertozzi | Matteo Bertozzi | Matteo Bertozzi | 03/Jul/12 17:08 | 26/Sep/12 14:17 | 30/Aug/12 14:30 | 3.4.3, 3.5.0 | 3.4.4, 3.5.0 | server | 0 | 5 | ZOOKEEPER-1455, ZOOKEEPER-1373, ZOOKEEPER-1503 | Currently the CnxnFactory checks for "java.security.auth.login.config" to decide whether or not enable SASL. * zookeeper/server/NIOServerCnxnFactory.java * zookeeper/server/NettyServerCnxnFactory.java ** configure() checks for "java.security.auth.login.config" *** If present start the new Login("Server", SaslServerCallbackHandler(conf)) But since the SaslServerCallbackHandler does the right thing just checking if getAppConfigurationEntry() is empty, we can allow SASL with JAAS configuration to be programmatically just checking weather or not a configuration entry is present instead of "java.security.auth.login.config". (Something quite similar was done for the SaslClient in ZOOKEEPER-1373) |
security | 242013 | No Perforce job exists for this issue. | 5 | 12498 | 7 years, 30 weeks ago |
Reviewed
|
0|i02hvr: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1496 | Ephemeral node not getting cleared even after client has exited |
Bug | Resolved | Critical | Fixed | Rakesh Radhakrishnan | suja s | suja s | 28/Jun/12 06:24 | 17/Sep/12 07:02 | 17/Sep/12 03:58 | 3.4.3 | 3.4.4, 3.5.0 | server | 0 | 9 | In one of the tests we performed, came across a case where the ephemeral node was not getting cleared from zookeeper though the client exited. Zk version: 3.4.3 Ephemeral node still exists in Zookeeper: HOST-xx-xx-xx-55:/home/Jun25_LR/install/zookeeper/bin # date Tue Jun 26 16:07:04 IST 2012 HOST-xx-xx-xx-55:/home/Jun25_LR/install/zookeeper/bin # ./zkCli.sh -server xx.xx.xx.55:2182 Connecting to xx.xx.xx.55:2182 Welcome to ZooKeeper! JLine support is enabled [zk: xx.xx.xx.55:2182(CONNECTING) 0] WATCHER:: WatchedEvent state:SyncConnected type:None path:null [zk: xx.xx.xx.55:2182(CONNECTED) 0] get /hadoop-ha/hacluster/ActiveStandbyElectorLock haclusternn2HOSt-xx-xx-xx-102 �� cZxid = 0x200000075 ctime = Tue Jun 26 13:10:19 IST 2012 mZxid = 0x200000075 mtime = Tue Jun 26 13:10:19 IST 2012 pZxid = 0x200000075 cversion = 0 dataVersion = 0 aclVersion = 0 ephemeralOwner = 0x1382791d4e50004 dataLength = 42 numChildren = 0 [zk: xx.xx.xx.55:2182(CONNECTED) 1] Grepped logs at ZK side for session "0x1382791d4e50004" - close session and later create coming before closesession processed. HOSt-xx-xx-xx-91:/home/Jun25_LR/install/zookeeper/logs # grep -E "/hadoop-ha/hacluster/ActiveStandbyElectorLock|0x1382791d4e50004" *|grep 0x200000074 2012-06-26 13:10:18,834 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::CommitProcessor@171] - Processing request:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 2012-06-26 13:10:19,892 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::Leader@716] - Proposing:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 2012-06-26 13:10:19,919 [myid:3] - DEBUG [LearnerHandler-/xx.xx.xx.102:13846:CommitProcessor@161] - Committing request:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a 2012-06-26 13:10:20,608 [myid:3] - DEBUG [CommitProcessor:3:FinalRequestProcessor@88] - Processing request:: sessionid:0x1382791d4e50004 type:closeSession cxid:0x0 zxid:0x200000074 txntype:-11 reqpath:n/a HOSt-xx-xx-xx-91:/home/Jun25_LR/install/zookeeper/logs # grep -E "/hadoop-ha/hacluster/ActiveStandbyElectorLock|0x1382791d4e50004" *|grep 0x200000075 2012-06-26 13:10:19,893 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::CommitProcessor@171] - Processing request:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 2012-06-26 13:10:19,920 [myid:3] - DEBUG [ProcessThread(sid:3 cport:-1)::Leader@716] - Proposing:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 2012-06-26 13:10:20,278 [myid:3] - DEBUG [LearnerHandler-/xx.xx.xx.102:13846:CommitProcessor@161] - Committing request:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a 2012-06-26 13:10:20,752 [myid:3] - DEBUG [CommitProcessor:3:FinalRequestProcessor@88] - Processing request:: sessionid:0x1382791d4e50004 type:create cxid:0x2 zxid:0x200000075 txntype:1 reqpath:n/a Close session and create requests coming almost parallely. Env: Hadoop setup. We were using Namenode HA with bookkeeper as shared storage and auto failover enabled. NN102 was active and NN55 was standby. FailoverController at 102 got shut down due to ZK connection error. The lock-ActiveStandbyElectorLock created (ephemeral node) by this failovercontroller is not cleared from ZK |
242205 | No Perforce job exists for this issue. | 5 | 12752 | 7 years, 27 weeks, 3 days ago |
Reviewed
|
0|i02jg7: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1495 | ZK client hangs when using a function not available on the server. |
Bug | Closed | Minor | Fixed | Nicolas Liochon | Nicolas Liochon | Nicolas Liochon | 28/Jun/12 03:51 | 13/Mar/14 14:16 | 24/Jan/13 20:37 | 3.4.2, 3.3.5 | 3.4.6, 3.5.0 | server | 0 | 7 | ZOOKEEPER-1381, HBASE-5843 | all | This happens for example when using zk#multi with a 3.4 client but a 3.3 server. The issue seems to be on the server side: the servers drops the packets with an unknown OpCode in ZooKeeperServer#submitRequest {noformat} public void submitRequest(Request si) { // snip try { touch(si.cnxn); boolean validpacket = Request.isValid(si.type); // ===> Check on case OpCode.* if (validpacket) { // snip } else { LOG.warn("Dropping packet at server of type " + si.type); // if invalid packet drop the packet. } } catch (MissingSessionException e) { if (LOG.isDebugEnabled()) { LOG.debug("Dropping request: " + e.getMessage()); } } } {noformat} The solution discussed in ZOOKEEPER-1381 would be to get an exception on the client side then & close the session. |
242206 | No Perforce job exists for this issue. | 4 | 12753 | 6 years, 2 weeks ago |
Reviewed
|
0|i02jgf: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1494 | C client: socket leak after receive timeout in zookeeper_interest() |
Bug | Resolved | Major | Fixed | Michi Mutsuzaki | Michi Mutsuzaki | Michi Mutsuzaki | 22/Jun/12 15:56 | 10/Sep/12 07:01 | 10/Sep/12 03:04 | 3.4.2, 3.3.5 | 3.4.4, 3.5.0 | c client | 0 | 5 | In zookeeper_interest(), we set zk->fd to -1 without closing it when timeout happens. Instead we should let handle_socket_error_msg() function take care of closing the socket properly. --Michi |
242207 | No Perforce job exists for this issue. | 3 | 12754 | 7 years, 28 weeks, 3 days ago |
Reviewed
|
0|i02jgn: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1493 | C Client: zookeeper_process doesn't invoke completion callback if zookeeper_close has been called |
Bug | Resolved | Major | Fixed | Michi Mutsuzaki | Michi Mutsuzaki | Michi Mutsuzaki | 20/Jun/12 16:47 | 21/Nov/12 05:12 | 29/Jul/12 01:35 | 3.4.3, 3.3.5 | 3.3.6, 3.4.4, 3.5.0 | c client | 0 | 7 | In ZOOKEEPER-804, we added a check in zookeeper_process() to see if zookeeper_close() has been called. This was to avoid calling assert(cptr) on a NULL pointer, as dequeue_completion() returns NULL if the sent_requests queue has been cleared by free_completion() from zookeeper_close(). However, we should still call the completion if it is not NULL. | 242208 | No Perforce job exists for this issue. | 3 | 12755 | 7 years, 18 weeks, 1 day ago |
Reviewed
|
0|i02jgv: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1492 | leader cannot switch to LOOKING state when lost the majority |
Bug | Resolved | Critical | Duplicate | Unassigned | gaoxiao | gaoxiao | 20/Jun/12 08:51 | 20/Jun/12 10:18 | 20/Jun/12 10:18 | 3.4.3 | quorum | 0 | 2 | 604800 | 604800 | 0% | eclipse linux | When a follower leave the cluster, and the cluster cannot achieve a majority, the leader should get out from Leading stat and get into Looking state, but if the there are some observers, the leader will not get away and the client cannot use the cluster. eg: The servers config: server.1=z1:2888:3888 server.2=z2:2888:3888 server.3=z3:2888:3888:observer At first, 1,2,3 are all started, it's all ok, 2 is the leader, but at this time, if 1 is stopped, 2 will not leave the Leading state, and client cannot connect to cluster. I think the problem is: (Leader.java method:lead) Line 388-407 syncedSet.add(self.getId()); synchronized (learners) { for (LearnerHandler f : learners) { if (f.synced()) { syncedCount++; syncedSet.add(f.getSid()); } f.ping(); } } if (!tickSkip && !self.getQuorumVerifier().containsQuorum(syncedSet)) { //if (!tickSkip && syncedCount < self.quorumPeers.size() / 2) { // Lost quorum, shutdown // TODO: message is wrong unless majority quorums used shutdown("Only " + syncedCount + " followers, need " + (self.getVotingView().size() / 2)); // make sure the order is the same! // the leader goes to looking return; } The code add all learners' ping to syncedSet, and I think at this place, only followers should be added to syncedSet, so the method 'containsQuorum' can figure out the majority. |
0% | 0% | 604800 | 604800 | 242209 | No Perforce job exists for this issue. | 0 | 12756 | 7 years, 40 weeks, 1 day ago | 0|i02jh3: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1491 | Help for create command in zkCli is misleading |
Bug | Resolved | Major | Duplicate | Unassigned | Keith Turner | Keith Turner | 19/Jun/12 14:59 | 13/Dec/12 17:19 | 13/Dec/12 17:19 | 3.3.3 | 0 | 2 | When I type help the shell, I see the following for the create command. {noformat} create [-s] [-e] path data acl {noformat} However, the ACL is optional. So I think the usage message should look like the following. {noformat} create [-s] [-e] path data [acl] {noformat} |
242210 | No Perforce job exists for this issue. | 0 | 12757 | 7 years, 15 weeks ago | 0|i02jhb: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1490 | If the configured log directory does not exist zookeeper will not start. Better to create the directory and start |
Bug | Resolved | Minor | Fixed | suja s | suja s | suja s | 19/Jun/12 00:10 | 30/Jun/12 07:01 | 30/Jun/12 02:08 | 3.4.4, 3.5.0 | scripts | 0 | 8 | if the configured log directory does not exists zookeeper will not start. Better to create the directory and start in zkEnv.sh we can change as follows if [ "x${ZOO_LOG_DIR}" = "x" ] then ZOO_LOG_DIR="." else if [ ! -w "$ZOO_LOG_DIR" ] ; then mkdir -p "$ZOO_LOG_DIR" fi fi |
242211 | No Perforce job exists for this issue. | 3 | 12758 | 7 years, 38 weeks, 5 days ago |
Reviewed
|
0|i02jhj: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1489 | Data loss after truncate on transaction log |
Bug | Resolved | Blocker | Fixed | Patrick D. Hunt | Christian Ziech | Christian Ziech | 18/Jun/12 05:09 | 18/Jul/12 07:01 | 17/Jul/12 17:29 | 3.4.3, 3.3.5 | 3.3.6, 3.4.4, 3.5.0 | server | 0 | 9 | Tested on Ubuntu 12.04 and CentOS 6, should be reproducible elsewhere | The truncate method on the transaction log in the class org.apache.zookeeper.server.persistence.FileTxnLog will reduce the file size to the required amount without either closing or re-positioning the logStream (which could also be dangerous since the truncate method is not synchronized against concurrent writes to the log). This causes the next append to that log to create a small "hole" in the file which java would interpret as binary zeroes when reading it. This then causes to the FileTxnIterator.next() implementation to detect the end of the log file too early. I'll attach a small maven project with one junit test which can be used to reproduce the issue. Due to the blackbox nature of the test it will run for roughly 50 seconds unfortunately. Steps to reproduce: - Start an ensemble of zookeeper servers with at least 3 participants - Create one entry and the remove one of the servers from the ensemble temporarily (e.g. zk-2) - Create another entry which is hence only reflected on zk-1 and zk-3 - Take zk-1 out of the ensemble without shutting it down (that is important, I did that by interrupting the network connection to that node) and clean zk-3 - Bring back zk-2 and zk-3 so that they form a quorum - Allow zk-1 to connect again - zk-1 will receive a TRUNC message from zk-2 since zk-1 is now a minority knowing about that second node creation event - Create a third node - Force zk-1 to become master somehow - That third node will be gone |
242018 | No Perforce job exists for this issue. | 14 | 12506 | 7 years, 36 weeks, 1 day ago |
Reviewed
|
0|i02hxj: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1488 | Some links are not working in the Zookeeper Documentation |
Bug | Open | Minor | Unresolved | Unassigned | Kiran BC | Kiran BC | 15/Jun/12 02:13 | 01/Jul/15 19:53 | 3.4.3 | documentation | 0 | 3 | There are some internal link errors in the Zookeeper documentation. The list is as follows: docs\zookeeperAdmin.html -> tickTime and datadir docs\zookeeperOver.html -> fg_zkComponents, fg_zkPerfReliability and fg_zkPerfRW docs\zookeeperStarted.html -> Logging |
242212 | No Perforce job exists for this issue. | 0 | 12759 | 7 years, 14 weeks, 2 days ago | 0|i02jhr: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1487 | if log4j.properties configuration parameters is not override by system properties then zookeeper not able to create log file. |
Bug | Resolved | Major | Invalid | Unassigned | Surendra Singh Lilhore | Surendra Singh Lilhore | 15/Jun/12 01:57 | 18/Jun/12 10:07 | 18/Jun/12 10:07 | server | 0 | 2 | In [ZOOKEEPER-980|https://issues.apache.org/jira/browse/ZOOKEEPER-980] for log4j.properties provide some properties that may be overridden using system properties. For example JVMFLAGS="-Dzookeeper.root.logger=DEBUG,CONSOLE,ROLLINGFILE -Dzookeeper.console.threshold=DEBUG" bin/zkServer.sh start But if we not override these property using system properties then zookeeper not able to create log file means these property not taking default value. |
242213 | No Perforce job exists for this issue. | 0 | 12760 | 7 years, 40 weeks, 3 days ago | 0|i02jhz: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1486 | A couple of bugs in the tutorial code |
Bug | Open | Minor | Unresolved | Unassigned | Dmitri Perelman | Dmitri Perelman | 12/Jun/12 17:07 | 20/Jul/12 18:11 | documentation | 1 | 2 | Hi, There are two problems with the barrier example code in the tutorial: 1) A znode created by a process in the function enter() is created with SEQUENTIAL suffix, however, the name of a znode deleted in the function leave() doesn't have this suffix. Actually, the leave() function tries to delete a nonexistent node => a KeeperException is thrown, which is caught silently => the process terminates without waiting for the barrier. 2) It seems that the very idea of leaving the barrier by deleting ephemeral nodes is problematic. Consider the following scenario: there are two clients: C1 and C2. - C1 enters the barrier, creates a znode /b1/C1, checks that it's alone and starts waiting for the second client to come. - C2 enters the barrier and creates a znode /b1/C2 - the notification to C1 is sent but still not delivered. - C2 observes that there are enough children to /b1, enters the barrier, executes its own operations and invokes leave() procedure. - during the leave() procedure C2 removes its znode /b1/C2 and exits. - when the notification about C2's arrival finally arrives to C1, C1 checks the children of /b1 and doesn't find C2's znode: C1 is stuck. The solution to this data race would be to create special znodes for leaving the barrier, similarly to the way they are created for entering the barrier. Thanks, Dima |
242214 | No Perforce job exists for this issue. | 0 | 12761 | 7 years, 35 weeks, 6 days ago | 0|i02ji7: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1485 | client xid overflow is not handled |
Bug | Open | Major | Unresolved | Martin Kuchta | Michi Mutsuzaki | Michi Mutsuzaki | 12/Jun/12 14:32 | 08/Jul/16 12:29 | 3.4.3, 3.3.5 | c client, java client | 0 | 9 | ZOOKEEPER-2318 | Both Java and C clients use signed 32-bit int as XIDs. XIDs are assumed to be non-negative, and zookeeper uses some negative values as special XIDs (e.g. -2 for ping, -4 for auth). However, neither Java nor C client ensures the XIDs it generates are non-negative, and the server doesn't reject negative XIDs. Pat had some suggestions on how to fix this: - (bin-compat) Expire the session when the client sends a negative XID. - (bin-incompat) In addition to expiring the session, use 64-bit int for XID so that overflow will practically never happen. --Michi |
242215 | No Perforce job exists for this issue. | 1 | 12762 | 3 years, 36 weeks, 6 days ago | 0|i02jif: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1484 | Missing znode found in the follower |
Bug | Resolved | Critical | Invalid | Thawan Kooburat | Thawan Kooburat | Thawan Kooburat | 11/Jun/12 17:36 | 15/Jun/12 22:04 | 15/Jun/12 22:04 | 3.4.3 | server | 0 | 0 | We noticed that one of the follower fail to restart due to missing parent node {noformat} 2012-05-29 15:44:41,037 [myid:9] - INFO [main:FileSnap@83] - Reading snapshot /var/facebook/zeus-server/data/global-ropt.0/version-2/snapshot.3d001f19c9 2012-05-29 15:44:43,300 [myid:9] - ERROR [main:FileTxnSnapLog@220] - Parent /phpunittest/1862297546 missing for /phpunittest/1862297546/dir1 2012-05-29 15:44:43,302 [myid:9] - ERROR [main:QuorumPeer@488] - Unable to load database on disk java.io.IOException: Failed to process transaction type: 1 error: KeeperErrorCode = NoNode for /phpunittest/1862297546 {noformat} We believed that the root cause is due to bugs in follower sync-up logic. Due to race condition, the follower may miss some proposals. The log below show that the follower see the commit message but it haven't seen this proposal before {noformat} 2012-05-15 15:11:27,449 [myid:13] - WARN [QuorumPeer[myid=13]/0.0.0.0:2182:Learner@378] - Got zxid 0x3c00282dc9 expected 0x3c00282dca {noformat} I can reproduce this by keep running FollowerResyncConcurrencyTest until failure occurs. I suspected that the root caused is due to how we handle toBeApplied and outstandingProposals in the leader. 1. In-flight proposals is removed from outstandingProposal before it is added to toBeApplied. Most of the problem I seen so far seem to caused by this gap. 2. startForwarding() iterate through outstandingProposal without locking PrepRequestProcessor properly, so there is possibility of missing in-flight proposal. |
242216 | No Perforce job exists for this issue. | 0 | 12763 | 7 years, 41 weeks, 3 days ago | Trunk seems to be OK. Found that our own effort in increasing the currency on the leader cause the issue. | 0|i02jin: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1483 | Fix leader election recipe documentation |
Bug | Resolved | Major | Fixed | Michi Mutsuzaki | Ankur Bansal | Ankur Bansal | 11/Jun/12 17:09 | 14/Dec/12 17:11 | 14/Sep/12 03:34 | 3.4.3 | 3.4.4, 3.5.0 | documentation | 0 | 5 | ZOOKEEPER-1404 | The leader election recipe documentation suggest that to avoid the herd effect a client process volunteering for leadership via child znode [i] under the leader election path [/leader] must only watch the the SMALLEST znode [j] from a different client process such that [j < i]. This will NOT avoid the herd effect as many clients will end up watching the same znode[j] where j is the next-in-sequence number greater than the number of the current leader. Specifically in Step 3 of the Election procedure here http://zookeeper.apache.org/doc/trunk/recipes.html#sc_leaderElection This "where j is the SMALLEST sequence number" should be changed to this "where j is the LARGEST sequence number" |
242217 | No Perforce job exists for this issue. | 2 | 12764 | 7 years, 27 weeks, 6 days ago | 0|i02jiv: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1482 | Batch get to improve perfermance |
New Feature | Resolved | Major | Duplicate | zhiyuan.dai | zhiyuan.dai | zhiyuan.dai | 11/Jun/12 02:14 | 21/May/14 16:12 | 21/May/14 16:12 | 3.3.2, 3.4.3 | 3.5.0, 4.0.0 | server | 0 | 7 | Now,Zookeeper doesn't have batch get feature,so i add this feature. The method is getChildrenData,we can use getChildrenData fetch some znode's children's data. |
242218 | No Perforce job exists for this issue. | 1 | 12765 | 5 years, 44 weeks, 1 day ago | 0|i02jj3: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1481 | allow the C cli to run exists with a watcher |
Improvement | Resolved | Major | Fixed | Patrick D. Hunt | Patrick D. Hunt | Patrick D. Hunt | 08/Jun/12 19:07 | 31/Aug/12 07:02 | 30/Aug/12 16:39 | 3.4.3 | 3.4.4, 3.5.0 | c client | 0 | 3 | Adds a wexists command and also improves the stdout (type string rather than just the number). Granted wexists is more for testing purposes than strictly necessary (we have exists already) but but still worthwhile to add imo. | 242219 | No Perforce job exists for this issue. | 1 | 12766 | 7 years, 29 weeks, 6 days ago | 0|i02jjb: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1480 | ClientCnxn(1161) can't get the current zk server add, so that - Session 0x for server null, unexpected error |
Bug | Open | Major | Unresolved | Leader Ni | Leader Ni | Leader Ni | 05/Jun/12 22:00 | 05/Feb/20 07:16 | 3.4.3 | 3.7.0, 3.5.8 | java client | 27/Jun/12 | 0 | 3 | When zookeeper occur an unexpected error( Not SessionExpiredException, SessionTimeoutException and EndOfStreamException), ClientCnxn(1161) will log such as the formart "Session 0x for server null, unexpected error, closing socket connection and attempting reconnect ". The log at line 1161 in zookeeper-3.3.3 We found that, zookeeper use "((SocketChannel)sockKey.channel()).socket().getRemoteSocketAddress()" to get zookeeper addr. But,Sometimes, it logs "Session 0x for server null", you know, if log null, developer can't determine the current zookeeper addr that client is connected or connecting. I add a method in Class SendThread:InetSocketAddress org.apache.zookeeper.ClientCnxn.SendThread.getCurrentZooKeeperAddr(). Here: /** * Returns the address to which the socket is connected. * * @return ip address of the remote side of the connection or null if not * connected */ @Override SocketAddress getRemoteSocketAddress() { // a lot could go wrong here, so rather than put in a bunch of code // to check for nulls all down the chain let's do it the simple // yet bulletproof way ..... |
client, getCurrentZooKeeperAddr | 242220 | No Perforce job exists for this issue. | 2 | 12767 | 6 years, 24 weeks, 2 days ago | client,zookeeper addr,server | 0|i02jjj: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1479 | C Client: zoo_add_auth() doesn't wake up the IO thread |
Bug | Open | Major | Unresolved | Unassigned | Michi Mutsuzaki | Michi Mutsuzaki | 03/Jun/12 20:48 | 05/Feb/20 07:15 | 3.4.3 | 3.7.0, 3.5.8 | c client | 0 | 3 | It can take up to sessionTimeout / 3 for the IO thread to send out the auth packet. The {{zoo_add_auth()}} function should call {{adaptor_send_queue(zh, 0)}} after {{calling send_last_auth_info(zh)}}. --Michi |
242221 | No Perforce job exists for this issue. | 0 | 12768 | 7 years, 42 weeks, 2 days ago | 0|i02jjr: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1478 | Small bug in QuorumTest.testFollowersStartAfterLeader( ) |
Bug | Closed | Minor | Fixed | Alexander Shraer | Alexander Shraer | Alexander Shraer | 02/Jun/12 23:03 | 13/Mar/14 14:16 | 13/Dec/12 02:19 | 3.4.3 | 3.4.6, 3.5.0 | tests | 0 | 6 | The following code appears in QuorumTest.testFollowersStartAfterLeader( ): for (int i = 0; i < 30; i++) { try { zk.create("/test", "test".getBytes(), ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT); break; } catch(KeeperException.ConnectionLossException e) { Thread.sleep(1000); } // test fails if we still can't connect to the quorum after 30 seconds. Assert.fail("client could not connect to reestablished quorum: giving up after 30+ seconds."); } From the comment it looks like the intention was to try to reconnect 30 times and only then trigger the Assert, but that's not what this does. After we fail to connect once and Thread.sleep is executed, Assert.fail will be executed without retrying create. |
239613 | No Perforce job exists for this issue. | 5 | 2416 | 6 years, 2 weeks ago |
Reviewed
|
0|i00rof: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1477 | Test failures with Java 7 on Mac OS X |
Bug | Resolved | Major | Not A Problem | Unassigned | Diwaker Gupta | Diwaker Gupta | 01/Jun/12 15:52 | 08/Oct/13 12:10 | 31/Aug/13 14:11 | 3.4.3 | server, tests | 8 | 20 | ZOOKEEPER-1550, GIRAPH-344 | Mac OS X Lion (10.7.4) Java version: java version "1.7.0_04" Java(TM) SE Runtime Environment (build 1.7.0_04-b21) Java HotSpot(TM) 64-Bit Server VM (build 23.0-b21, mixed mode) |
I downloaded ZK 3.4.3 sources and ran {{ant test}}. Many of the tests failed, including ZooKeeperTest. A common symptom was spurious {{ConnectionLossException}}: {code} 2012-06-01 12:01:23,420 [myid:] - INFO [main:JUnit4ZKTestRunner$LoggedInvokeMethod@54] - TEST METHOD FAILED testDeleteRecursiveAsync org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for / at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1246) at org.apache.zookeeper.ZooKeeperTest.testDeleteRecursiveAsync(ZooKeeperTest.java:77) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ... (snipped) {code} As background, I was actually investigating some non-deterministic failures when using Netflix's Curator with Java 7 (see https://github.com/Netflix/curator/issues/79). After a while, I figured I should establish a clean ZK baseline first and realized it is actually a ZK issue, not a Curator issue. We are trying to migrate to Java 7 but this is a blocking issue for us right now. |
242222 | No Perforce job exists for this issue. | 1 | 12769 | 6 years, 29 weeks, 5 days ago | 0|i02jjz: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1476 | ipv6 reverse dns related timeouts on OSX connecting to localhost |
Bug | Open | Minor | Unresolved | Unassigned | Jilles van Gurp | Jilles van Gurp | 01/Jun/12 09:01 | 03/Jul/14 18:34 | 2 | 7 | ZOOKEEPER-1661, ZOOKEEPER-1954 | We observed a weird, random issue trying to create zookeeper client connections on osx. Sometimes it would work and sometimes it would fail. Also it is randomly very slow. It turns out both issues have the same cause. My hosts file on osx (which is an unmodified default one), lists three entries for localhost: 127.0.0.1 localhost ::1 localhost fe80::1%lo0 localhost We saw zookeeper trying to connect to fe80:0:0:0:0:0:0:1%1 sometimes, which is not listed (actually one in four times, it seems to round robin over the addresses). Whenever that happens, it sometimes works and sometimes fails. In both cases it's very slow. Reason: the reverse lookup for fe80:0:0:0:0:0:0:1%1 can't be resolved using the hosts file and it falls back to actually using the dns. Sometimes it actually works but other times it fails/times out after about 5 seconds. Probably a platform specific settings with dns setup hide this problem on linux. As a workaround, we preresolve localhost now: Inet4Address.getByName("localhost"). This always resolves to 127.0.0.1 on my machine and works fast. This fixes the issue for us. We're not sure where the fe80:0:0:0:0:0:0:1%1 address comes from though. I don't recall having this issue with other server side software so this might be a mix of platform setup, osx specific defaults, and zookeeper behavior. I've seen one ticket that relates to ipv6 in zookeeper that might be related: ZOOKEEPER-667. Perhaps the workaround for that ticket introduced this problem? |
242223 | No Perforce job exists for this issue. | 0 | 12770 | 5 years, 38 weeks ago | 0|i02jk7: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1475 | Messages about missing JAAS configuration should not be logged at WARN level |
Improvement | Open | Major | Unresolved | Unassigned | Andrew Kyle Purtell | Andrew Kyle Purtell | 31/May/12 15:08 | 10/Sep/12 00:26 | 0 | 3 | HBASE-6099 | Messages about unconfigured JAAS settings probably should not be logged at WARN level because it's intentional if the user is not using any SASL based security features. The user may conclude that security is not optional, or that the missing JAAS configuration is behind failures that have an unrelated cause. Perhaps INFO level instead. | 242224 | No Perforce job exists for this issue. | 0 | 12771 | 7 years, 28 weeks, 3 days ago | 0|i02jkf: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1474 | Cannot build Zookeeper with IBM Java: use of Sun MXBean classes |
Bug | Closed | Major | Fixed | Paulo Ricardo Paz Vital | Adalberto Medeiros | Adalberto Medeiros | 30/May/12 10:31 | 13/Mar/14 14:17 | 28/Nov/12 02:46 | 3.4.0, 3.4.3, 3.4.4, 3.4.5 | 3.4.6, 3.5.0 | build | 0 | 10 | ZOOKEEPER-1236, ZOOKEEPER-1565, ZOOKEEPER-1570, ZOOKEEPER-1571, ZOOKEEPER-1564 | zookeeper.server.NIOServerCnxn and zookeeper.server.NettyServerCnxn imports com.sun.management.UnixOperatingSystemMXBean . This OperatingSystemMXBean class is not implemented by IBM or open java. In my case, I need IBM Java so I can run zookeeper in Power ppc64 servers. |
build | 242225 | No Perforce job exists for this issue. | 8 | 12772 | 6 years, 2 weeks ago |
Reviewed
|
0|i02jkn: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1473 | Committed proposal log retains triple the memory it needs to |
Bug | Open | Major | Unresolved | Thawan Kooburat | Henry Robinson | Henry Robinson | 29/May/12 15:58 | 05/Feb/20 07:16 | 3.7.0, 3.5.8 | server | 1 | 7 | ZKDatabase.committedLog retains the past 500 transactions to enable fast catch-up. This works great, but it's using triple the memory it needs to by retaining three copies of the data part of any transaction. * The first is in committedLog[i].request.request.hb - a heap-allocated {{ByteBuffer}}. * The second is in committedLog[i].request.txn.data - a jute-serialised record of the transaction * The third is in committedLog[i].packet.data - also jute-serialised, seemingly uninitialised data. This means that a ZK-server could be using 1G of memory more than it should be in the worst case. We should use just one copy of the data, even if we really have to refer to it 3 times. |
242226 | No Perforce job exists for this issue. | 3 | 12773 | 5 years, 51 weeks, 4 days ago | 0|i02jkv: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1472 | WatchedEvent class missing from documentation |
Bug | Open | Minor | Unresolved | Unassigned | David Nickerson | David Nickerson | 25/May/12 10:34 | 25/May/12 10:34 | 3.3.5 | documentation | 1 | 2 | org.apache.zookeeper.WatchedEvent is missing from the 3.3.5 documentation. | documentation | 242227 | No Perforce job exists for this issue. | 0 | 12774 | 7 years, 43 weeks, 6 days ago | 0|i02jl3: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1471 | Jute generates invalid C++ code |
Bug | Resolved | Minor | Fixed | Michi Mutsuzaki | Michi Mutsuzaki | Michi Mutsuzaki | 20/May/12 21:10 | 30/Jun/12 07:01 | 30/Jun/12 02:44 | 3.4.3 | 3.4.4, 3.5.0 | jute | 0 | 4 | There are 2 issues with the current jute generated C++ code. 1. Variable declaration for JRecord is incorrect. It looks something like this: {code} Id id; {code} It should be like this instead: {code} org::apache::zookeeper::data::Id mid; {code} 2. The header file declares all the variables (except for JRecord ones) with "m" prefix, but the .cc file doesn't use the prefix. |
242228 | No Perforce job exists for this issue. | 1 | 12775 | 7 years, 38 weeks, 5 days ago |
Reviewed
|
0|i02jlb: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1470 | zkpython: close() should delete any watcher |
Bug | Open | Minor | Unresolved | Unassigned | Paul Giannaros | Paul Giannaros | 20/May/12 13:00 | 20/May/12 14:31 | 3.4.3 | contrib-bindings | 0 | 2 | 3600 | 3600 | 0% | When calling zookeeper.close(handle), any connection watcher for the handle is not deleted. This is a source of memory leaks for applications that create and close lots of connections. Its damage can be mitigated to some degree by changing the watcher to some function that won't keep references to instances alive before calling close. The fix is just to add a free_pywatcher(..) call in the close sequence. Alternatively you could allow set_watcher(handle, None) as a way of deleting the watcher, but it's probably best to take care of it on close too. |
0% | 0% | 3600 | 3600 | memory_leak, python | 242229 | No Perforce job exists for this issue. | 0 | 12776 | 7 years, 44 weeks, 4 days ago | 0|i02jlj: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1469 | Adding Cross-Realm support for secure Zookeeper client authentication |
Improvement | Reopened | Major | Unresolved | Eugene Joseph Koontz | Himanshu Vashishtha | Himanshu Vashishtha | 20/May/12 02:13 | 05/Feb/20 07:15 | 3.4.3 | 3.7.0, 3.5.8 | documentation | 0 | 11 | ZOOKEEPER-938, HBASE-6130 | There is a use case where one needs to support cross realm authentication for zookeeper cluster. One use case is HBase Replication: HBase supports replicating data to multiple slave clusters, where the later might be running in different realms. With current zookeeper security, the region server of master HBase cluster are not able to query the zookeeper quorum members of the slave cluster. This jira is about adding such Xrealm support. |
242230 | No Perforce job exists for this issue. | 1 | 12777 | 3 years, 39 weeks, 2 days ago | 0|i02jlr: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1468 | Accurately name znode count in "four-letter words" |
Improvement | Open | Minor | Unresolved | Unassigned | Adam Rosien | Adam Rosien | 18/May/12 13:38 | 18/May/12 13:38 | 0 | 1 | The 'stat' and 'srvr' four-letter word commands refer to "Node Count" as the number of znodes, but this is an ambiguous label (cluster nodes? znodes?) I suggest renaming the label to "ZNode Count", or something similar. This will break existing parsers of the commands' output. |
242231 | No Perforce job exists for this issue. | 0 | 12778 | 7 years, 44 weeks, 6 days ago | 0|i02jlz: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1467 | Make server principal configurable at client side. |
Improvement | Closed | Major | Fixed | Sujith Simon | Laxman | Laxman | 16/May/12 07:57 | 14/Feb/20 10:23 | 01/Oct/19 03:37 | 3.4.3, 3.4.4, 3.5.0 | 3.6.0, 3.5.7 | java client | 0 | 18 | 0 | 8400 | ZOOKEEPER-1373, ZOOKEEPER-1420, HBASE-1697, ZOOKEEPER-2257, ZOOKEEPER-2139, HBASE-4791, ZOOKEEPER-2433 | Server principal on client side is derived using hostname. org.apache.zookeeper.ClientCnxn.SendThread.startConnect() {code} try { zooKeeperSaslClient = new ZooKeeperSaslClient("zookeeper/"+addr.getHostName()); } {code} This may have problems when admin wanted some customized principals like zookeeper/clusterid@HADOOP.COM where clusterid is the cluster identifier but not the host name. IMO, server principal also should be configurable as hadoop is doing. |
100% | 100% | 8400 | 0 | Security, client, kerberos, pull-request-available, sasl | 239707 | No Perforce job exists for this issue. | 2 | 2587 | 22 weeks, 3 days ago | Allow system property "zookeeper.clusterName", if defined, to be used as the instance portion of zookeeper server's Kerberos principal name. Otherwise, server's hostname will be used. | 0|i00sqf: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1466 | QuorumCnxManager.shutdown missing synchronization |
Bug | Resolved | Blocker | Fixed | Patrick D. Hunt | Patrick D. Hunt | Patrick D. Hunt | 15/May/12 13:21 | 30/Jun/12 07:01 | 29/Jun/12 16:13 | 3.4.0, 3.3.5, 3.5.0 | 3.3.6, 3.4.4, 3.5.0 | quorum | 0 | 4 | org.apache.zookeeper.server.quorum.QuorumCnxManager.shutdown is not being synchronized even though it's accessed by multiple threads. | 242232 | No Perforce job exists for this issue. | 1 | 12779 | 7 years, 38 weeks, 5 days ago |
Reviewed
|
0|i02jm7: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1465 | Cluster availability following new leader election takes a long time with large datasets - is correlated to dataset size |
Bug | Resolved | Critical | Fixed | Camille Fournier | Alex Gvozdenovic | Alex Gvozdenovic | 10/May/12 10:47 | 17/Jul/12 20:33 | 05/Jul/12 12:50 | 3.4.3 | 3.4.4, 3.5.0 | leaderElection | 0 | 12 | When re-electing a new leader of a cluster, it takes a long time for the cluster to become available if the dataset is large Test Data ---------- 650mb snapshot size 20k nodes of varied size 3 member cluster On 3.4.x branch (http://svn.apache.org/repos/asf/zookeeper/branches/branch-3.4?r=1244779) ------------------------------------------------------------------------------------------ Takes 3-4 minutes to bring up a cluster from cold Takes 40-50 secs to recover from a leader failure Takes 10 secs for a new follower to join the cluster Using the 3.3.5 release on the same hardware with the same dataset ----------------------------------------------------------------- Takes 10-20 secs to bring up a cluster from cold Takes 10 secs to recover from a leader failure Takes 10 secs for a new follower to join the cluster I can see from the logs in 3.4.x that once a new leader is elected, it pushes a new snapshot to each of the followers who need to save it before they ack the leader who can then mark the cluster as available. The kit being used is a low spec vm so the times taken are not relevant per se - more the fact that a snapshot is always sent even through there is no difference between the persisted state on each peer. No data is being added to the cluster while the peers are being restarted. |
238940 | No Perforce job exists for this issue. | 5 | 12502 | 7 years, 36 weeks, 2 days ago | 0|i02hwn: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1464 | document that event notification is single threaded in java/c client implementations |
Improvement | Open | Major | Unresolved | Unassigned | Patrick D. Hunt | Patrick D. Hunt | 09/May/12 16:41 | 09/May/12 16:41 | documentation | 0 | 0 | The docs don't currently mention that there's a single thread delivering watches. Callee's should be aware of this, typically means don't make blocking calls (esp on other events!) and to limit time in the routine. | 238813 | No Perforce job exists for this issue. | 0 | 12780 | 7 years, 46 weeks, 1 day ago | 0|i02jmf: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1463 | external inline function is not compatible with C99 |
Bug | Resolved | Major | Duplicate | Michael Hu | Michael Hu | Michael Hu | 07/May/12 17:52 | 11/May/12 01:26 | 11/May/12 01:26 | 3.4.3, 3.3.5 | 3.4.4, 3.5.0 | build | 0 | 0 | 360 | 360 | 0% | debian linux x64 | There is a use of external inline function in zookeeper hashtable_itr.h file, which is not compatible with C99. This causes problem when compiling with other library like code coverage library. --- hashtable_itr.h:37: error: 'cov_v_cab2c78b' is static but used in inline function 'hashtable_iterator_key' which is not static --- The easy fix would be put the following line in hashtable_itr.c which ignores this inline warning. #pragma GCC diagnostic ignored "-Winline" |
0% | 0% | 360 | 360 | external, inline | 238471 | No Perforce job exists for this issue. | 1 | 12781 | 7 years, 45 weeks, 6 days ago | 0|i02jmn: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1462 | Read-only server does not initialize database properly |
Bug | Closed | Critical | Fixed | Thawan Kooburat | Thawan Kooburat | Thawan Kooburat | 02/May/12 21:37 | 13/Mar/14 14:16 | 02/Oct/13 18:42 | 3.4.3 | 3.4.6 | server | 0 | 5 | ZOOKEEPER-1552 | Brief Description: When a participant or observer get partitioned and restart as Read-only server. ZkDb doesn't get reinitialized. This causes the RO server to drop any incoming request with zxid > 0 Error message: Refusing session request for client /xx.xx.xx.xx:39875 as it has seen zxid 0x2e00405fd9 our last zxid is 0x0 client must try another server Steps to reproduce: Start an RO-enabled observer connecting to an ensemble. Kill the ensemble and wait until the observer restart in RO mode. Zxid of this observer should be 0. Description: Before a server transition into LOOKING state, its database get closed as part of shutdown sequence. The database of leader, follower and observer get initialized as a side effect of participating in leader election protocol. (eg. observer will call registerWithLeader() and call getLastLoggedZxid() which initialize the db if not already). However, RO server does not participate in this protocol so its DB doesn't get initialized properly |
237890 | No Perforce job exists for this issue. | 1 | 12782 | 6 years, 2 weeks ago | 0|i02jmv: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1461 | Zookeeper C client doesn't check for NULL before dereferencing in prepend_string |
Improvement | Resolved | Major | Duplicate | Stephen Tyree | Stephen Tyree | Stephen Tyree | 01/May/12 16:35 | 29/Jul/12 01:37 | 02/May/12 10:07 | 3.3.5 | c client | 0 | 0 | 0 | 0 | 0% | ZOOKEEPER-1305 | prepend_string, called before any checks for NULL in the c client for many API functions, has this line (zookeeper 3.3.5): if (zh->chroot == NULL) That means that before you check for NULL, you are dereferencing the pointer. This bug does not exist in the 3.4.* branch for whatever reason, but it still remains in the 3.3.* line. A patch which fixes it would make the line as follows: if (zh == NULL || zh->chroot == NULL) I would do that for you, but I don't know how to patch the 3.3.5 branch. |
0% | 0% | 0 | 0 | 237704 | No Perforce job exists for this issue. | 1 | 12783 | 7 years, 47 weeks, 1 day ago | 0|i02jn3: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1460 | IPv6 literal address not supported for quorum members |
Bug | Closed | Major | Fixed | Joseph Walton | Chris Dolan | Chris Dolan | 30/Apr/12 15:49 | 21/Jul/16 16:18 | 23/Jun/16 16:21 | 3.4.3 | 3.5.2, 3.6.0 | quorum | 5 | 19 | ZOOKEEPER-2452 | Via code inspection, I see that the "server.nnn" configuration key does not support literal IPv6 addresses because the property value is split on ":". In v3.4.3, the problem is in QuorumPeerConfig: {noformat} String parts[] = value.split(":"); InetSocketAddress addr = new InetSocketAddress(parts[0], Integer.parseInt(parts[1])); {noformat} In the current trunk (http://svn.apache.org/viewvc/zookeeper/trunk/src/java/main/org/apache/zookeeper/server/quorum/QuorumPeer.java?view=markup) this code has been refactored into QuorumPeer.QuorumServer, but the bug remains: {noformat} String serverClientParts[] = addressStr.split(";"); String serverParts[] = serverClientParts[0].split(":"); addr = new InetSocketAddress(serverParts[0], Integer.parseInt(serverParts[1])); {noformat} This bug probably affects very few users because most will naturally use a hostname rather than a literal IP address. But given that IPv6 addresses are supported for clients via ZOOKEEPER-667 it seems that server support should be fixed too. |
237568 | No Perforce job exists for this issue. | 6 | 12784 | 3 years, 39 weeks ago | IPv6 addresses are now properly parsed in the config |
Reviewed
|
0|i02jnb: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1459 | ZOOKEEPER-1833 Standalone ZooKeeperServer is not closing the transaction log files on shutdown |
Sub-task | Closed | Major | Fixed | Rakesh Radhakrishnan | Rakesh Radhakrishnan | Rakesh Radhakrishnan | 30/Apr/12 04:11 | 19/Dec/19 12:30 | 07/Dec/13 05:19 | 3.4.0 | 3.4.6, 3.5.0 | server | 0 | 9 | When shutdown the standalone ZK server, its only clearing the zkdatabase and not closing the transaction log streams. When tries to delete the temporary files in unit tests on windows, its failing. ZooKeeperServer.java {noformat} if (zkDb != null) { zkDb.clear(); } {noformat} Suggestion to close the zkDb as follows, this inturn will take care transaction logs: {noformat} if (zkDb != null) { zkDb.clear(); try { zkDb.close(); } catch (IOException ie) { LOG.warn("Error closing logs ", ie); } } {noformat} |
237452 | No Perforce job exists for this issue. | 10 | 12785 | 5 years, 44 weeks, 2 days ago |
Incompatible change
|
0|i02jnj: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1458 | Parent's cversion doesn't match the sequence number that get assigned to a child node with the SEQUENTIAL flag on. |
Bug | Resolved | Major | Not A Problem | Patrick D. Hunt | Andrey Kornev | Andrey Kornev | 29/Apr/12 21:00 | 30/Apr/12 19:10 | 30/Apr/12 19:10 | 3.4.3 | server | 0 | 0 | All | If I have a child delete op interleaving two child create ops, the second child create will nevertheless have the path suffix incremented only by 1 rather than by 2. Is this expected? The 3.3.5 version takes into account the delete and increments the sequence by 2. PrepRequestProcessor uses the parent's cversion to generate the child's sequence suffix. However it appears that this particular cversion only counts "create" operations and it doesn't take into account the deletes. Strangely enough, the parent stats returned by getData() show the correct cversion with all the creates and deletes accounted for. It looks like the first cversion comes from the ChangeRecord for the parent node stuck in ZooKeeperServer.outstandingChangesForPath map. And the second one (returned by getData(), that is) comes from the DataTree. Here's a simple example that reproduces the situation. zk.create("/parent", null, OPEN_ACL_UNSAFE, CreateMode.PERSISTENT); Stat stat = new Stat(); zk.getData("/parent", false, stat); stat.getCVersion(); // returns 0 -- expected; String actualPath = zk.create("/parent/child", null, OPEN_ACL_UNSAFE, CreateMode.EPHEMERAL_SEQUENTIAL); // actualPath is "/parent/child0000000000" -- expected. zk.getData("/parent", false, stat); stat.getCVersion(); // returns 1 -- expected; zk.getData(actualPath, false, stat); zk.delete(actualPath,stat.getVersion()); // delete the child node zk.getData("/parent", false, stat); stat.getCVersion(); // returns 2; // create another child actualPath = zk.create("/parent/child", null, OPEN_ACL_UNSAFE, CreateMode.EPHEMERAL_SEQUENTIAL); // returned "/parent/child0000000001" but expected "/parent/child0000000002" zk.getData("/parent", false, stat); stat.getCVersion(); // returns 3; |
237430 | No Perforce job exists for this issue. | 0 | 12786 | 7 years, 47 weeks, 3 days ago | 0|i02jnr: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1457 | Ephemeral node deleted for unexpired sessions |
Bug | Open | Major | Unresolved | Unassigned | Neha Narkhede | Neha Narkhede | 26/Apr/12 15:25 | 30/Apr/12 15:00 | 3.3.4 | 0 | 5 | This week, we saw a potential bug with zookeeper 3.3.4. In an attempt to adding a separate disk for zookeeper transaction logs, our SysOps team threw new disks at all the zookeeper servers in our production cluster at around the same time. Right after this, we saw degraded performance on our zookeeper cluster. And yes, I agree that this degraded behavior is expected and we could've done a better job and upgraded one server at a time. Al though, the observed impact was that ephemeral nodes got deleted without session expiration on the zookeeper clients. Let me try and describe what I've observed from the Kafka and ZK server logs - Kafka client has a session established with ZK, say Session A, that it has been using successfully. At the time of the degraded ZK performance issue, Session A expires. Kafka's ZkClient tries to establish another session with ZK. After 9 seconds, it establishes a session, say Session B and tries to use it for creating a znode. This operation fails with a NodeExists error since another session, say session C, has created that znode. This is considered OK since ZkClient retries an operation transparently if it gets disconnected and sometimes you can get NodeExists. But then later, session C expires and hence the ephemeral node is deleted from ZK. This leads to unexpected errors in Kafka since its session, Session B, is still valid and hence it expects the znode to be there. The issue is that session C was established, created the znode and expired, without the zookeeper client on Kafka ever knowing about it. |
236698 | No Perforce job exists for this issue. | 0 | 32547 | 7 years, 47 weeks, 3 days ago | 0|i05xmn: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1456 | sessions cannot specify whether they require kerberos authenticated sessions or not |
Bug | Open | Major | Unresolved | Unassigned | Patrick D. Hunt | Patrick D. Hunt | 26/Apr/12 13:17 | 27/Apr/12 10:42 | 0 | 4 | ZOOKEEPER-1455 | When creating a session there is no way for a client to specify that they require a kerberos (via sasl) authenticated session. Similarly there is no way to request an unauthenticated session if kerberos has been configured at the jvm level. | 236849 | No Perforce job exists for this issue. | 0 | 32548 | 7 years, 48 weeks ago | 0|i05xmv: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1455 | there is no way to determine if a session is sasl authenticated or not |
Bug | Open | Critical | Unresolved | Unassigned | Patrick D. Hunt | Patrick D. Hunt | 26/Apr/12 13:12 | 25/Sep/13 05:53 | 3 | 9 | ZOOKEEPER-1456, ZOOKEEPER-1437, ZOOKEEPER-1764, ZOOKEEPER-1381, ZOOKEEPER-1497, HADOOP-8315 | The ZooKeeper interface provides no way to determine if the session is sasl authenticated or not. There is an event sent to the watcher when the sasl authentication completes, however there no way to determine if there is intent to negotiate via sasl. As a result the event cannot be used to wait to send messages until the authentication has completed. see HADOOP-8315 | 236850 | No Perforce job exists for this issue. | 0 | 32549 | 7 years, 46 weeks ago | 0|i05xn3: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1454 | Document how to run autoreconf if cppunit is installed in a non-standard directory |
Improvement | Resolved | Trivial | Fixed | Michi Mutsuzaki | Michi Mutsuzaki | Michi Mutsuzaki | 25/Apr/12 17:18 | 30/Jun/12 07:01 | 30/Jun/12 02:51 | 3.3.6, 3.4.4, 3.5.0 | c client | 0 | 3 | By default, the source distribution of cppunit is installed under /usr/local. When you run {{autoreconf -if}}, you get an error like this: {code} $ autoreconf -if configure.ac:37: warning: macro `AM_PATH_CPPUNIT' not found in library configure.ac:37: warning: macro `AM_PATH_CPPUNIT' not found in library configure.ac:37: error: possibly undefined macro: AM_PATH_CPPUNIT If this token and others are legitimate, please use m4_pattern_allow. See the Autoconf documentation. autoreconf: /usr/local/bin/autoconf failed with exit status: 1 {code} This is because {{cppunit.m4}} is installed under /usr/local/share/aclocal, but aclocal only looks at {{/usr/share/aclocal-$VERSION}} and {{/usr/share/aclocal}} assuming it was configured with {{--prefix=/usr}}. There are 3 ways to specify additional paths. 1. Set {{ACLOCAL}}. {code} ACLOCAL="aclocal -I /usr/local/share/aclocal" autoreconf -if {code} 2. Set {{ACLOCAL_PATH}}. {code} ACLOCAL_PATH=/usr/local/share/aclocal autoreconf -if {code} 3. Set {{ACLOCAL_FLAGS}}. {code} ACLOCAL_FLAGS="-I /usr/local/share/aclocal" autoreconf -if {code} Apparently older versions of autoreconf don't respect ACLOCAL_PATH or ACLOCAL_FLAGS, so using ACLOCAL is probably the best way to fix it. I'll update src/c/README to document this. --Michi |
236696 | No Perforce job exists for this issue. | 1 | 33278 | 7 years, 38 weeks, 5 days ago |
Reviewed
|
0|i06253: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1453 | corrupted logs may not be correctly identified by FileTxnIterator |
Bug | Open | Critical | Unresolved | Unassigned | Patrick D. Hunt | Patrick D. Hunt | 24/Apr/12 18:05 | 18/Mar/16 16:05 | 3.3.3 | server | 1 | 7 | See ZOOKEEPER-1449 for background on this issue. The main problem is that during server recovery org.apache.zookeeper.server.persistence.FileTxnLog.FileTxnIterator.next() does not indicate if the available logs are valid or not. In some cases (say a truncated record and a single txnlog in the datadir) we will not detect that the file is corrupt, vs reaching the end of the file. | 236868 | No Perforce job exists for this issue. | 6 | 32550 | 4 years, 6 days ago | 0|i05xnb: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1452 | zoo_multi() & zoo_amulti() update operations for zkpython |
Improvement | Patch Available | Major | Unresolved | Aravind Narayanan | Aravind Narayanan | Aravind Narayanan | 19/Apr/12 20:46 | 05/Feb/20 07:11 | 3.7.0, 3.5.8 | contrib-bindings | 1 | 6 | 1209600 | 1209600 | 0% | ZooKeeper's python bindings (src/contrib/zkpython) are missing multi-update support ({{zoo_multi()}} & {{zoo_amulti()}}) that was added to the C client recently. This issue is to bridge this gap, and add support for multi-update operations to the Python bindings. | 0% | 0% | 1209600 | 1209600 | python | 236509 | No Perforce job exists for this issue. | 4 | 2513 | 3 years, 39 weeks, 2 days ago | Adds new `zoo_multi()` && `zoo_amulti()` functionality to the zkpython bindings for zookeeper. Includes a new unit test. Also used the functions from a python program that uses zkpython. All existing unit tests still pass. |
python, zkpython | 0|i00s9z: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1451 | C API improperly logs getaddrinfo failures on Linux when using glibc |
Bug | Resolved | Trivial | Fixed | Stephen Tyree | Stephen Tyree | Stephen Tyree | 19/Apr/12 14:04 | 25/Apr/12 16:29 | 25/Apr/12 16:29 | 3.4.3 | 3.5.0 | c client | 0 | 1 | Linux when using glibc | This is how the code currently logs getaddrinfo errors: {quote} errno = getaddrinfo_errno(rc); LOG_ERROR(("getaddrinfo: %s\n", strerror(errno))); {quote} On Linux, specifically when using glibc, there is a better function for logging getaddrinfo errors called gai_strerror. An example: {quote} LOG_ERROR(("getaddrinfo: %s\n", gai_strerror(rc))); {quote} It doesn't miss a lot of cases like the errno based version does. |
236460 | No Perforce job exists for this issue. | 1 | 32551 | 7 years, 48 weeks, 1 day ago | 0|i05xnj: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1450 | Backport ZOOKEEPER-1294 fix to 3.4 and 3.3 |
Task | Resolved | Major | Fixed | Norman Bishop | Norman Bishop | Norman Bishop | 19/Apr/12 13:37 | 02/Mar/16 20:37 | 22/Apr/12 15:28 | 3.4.3, 3.3.5 | 3.3.6, 3.4.4 | server | 0 | 0 | The bug from ZOOKEEPER-1294 affects 3.4 and 3.3 as well, and the patch should be backported. | 236459 | No Perforce job exists for this issue. | 3 | 33279 | 7 years, 48 weeks, 4 days ago | 0|i0625b: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1449 | Ephemeral znode not deleted after session has expired on one follower (quorum is in an inconsistent state) |
Bug | Resolved | Major | Cannot Reproduce | Patrick D. Hunt | Daniel Lord | Daniel Lord | 17/Apr/12 14:26 | 02/Oct/13 12:16 | 02/Oct/13 05:58 | 0 | 2 | ZOOKEEPER-1777 | I've been running in to this situation in our labs fairly regularly where we'll get a single follower that will be in an inconsistent state with dangling ephemeral znodes. Here is all of the information that I have right now. Please ask if there is anything else that is useful. Here is a quick snapshot of the state of the ensemble where you can see it is out of sync across several znodes: -bash-3.2$ echo srvr | nc il23n04sa-zk001 2181 Zookeeper version: 3.3.3-cdh3u2--1, built on 10/14/2011 05:17 GMT Latency min/avg/max: 0/7/25802 Received: 64002 Sent: 63985 Outstanding: 0 Zxid: 0x500000a41 Mode: follower Node count: 497 -bash-3.2$ echo srvr | nc il23n04sa-zk002 2181 Zookeeper version: 3.3.3-cdh3u2--1, built on 10/14/2011 05:17 GMT Latency min/avg/max: 0/13/79032 Received: 74320 Sent: 74276 Outstanding: 0 Zxid: 0x500000a41 Mode: leader Node count: 493 -bash-3.2$ echo srvr | nc il23n04sa-zk003 2181 Zookeeper version: 3.3.3-cdh3u2--1, built on 10/14/2011 05:17 GMT Latency min/avg/max: 0/2/25234 Received: 187310 Sent: 187320 Outstanding: 0 Zxid: 0x500000a41 Mode: follower Node count: 493 All of the zxids match up just fine but zk001 has 4 more nodes in its node count than the other two (including the leader..). When I use a zookeeper client connect to connect directly to zk001 I can see the following znode that should no longer exist: [zk: localhost:2181(CONNECTED) 0] stat /siri/Douroucouli/clients/il23n04sa-app004.siri.apple.com:38096 cZxid = 0x40000001a ctime = Mon Apr 16 11:00:47 PDT 2012 mZxid = 0x40000001a mtime = Mon Apr 16 11:00:47 PDT 2012 pZxid = 0x40000001a cversion = 0 dataVersion = 0 aclVersion = 0 ephemeralOwner = 0x236bc504cb50002 dataLength = 0 numChildren = 0 This node does not exist using the client to connect to either of the other two members of the ensemble. I searched through the logs for that session id and it looks like it was established and closed cleanly. There were several leadership/quorum problems during the course of the session but it looks like it should have been shut down properly. Neither the session nor the znode show up in the "dump" on the leader but the problem znode does show up in the "dump" on zk001. 2012-04-16 11:00:47,637 - INFO [CommitProcessor:2:NIOServerCnxn@1580] - Established session 0x236bc504cb50002 with negotiated timeout 15000 for client /17.202.71.201:38971 2012-04-16 11:20:59,341 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@770] - Client attempting to renew session 0x236bc504cb50002 at /17.202.71.201:50841 2012-04-16 11:20:59,342 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1580] - Established session 0x236bc504cb50002 with negotiated timeout 15000 for client /17.202.71.201:50841 2012-04-16 11:21:09,343 - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@634] - EndOfStreamException: Unable to read additional data from client sessionid 0x236bc504cb50002, likely client has closed socket 2012-04-16 11:21:09,343 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1435] - Closed socket connection for client /17.202.71.201:50841 which had sessionid 0x236bc504cb50002 2012-04-16 11:21:20,352 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:NIOServerCnxn@1435] - Closed socket connection for client /17.202.71.201:38971 which had sessionid 0x236bc504cb50002 2012-04-16 11:21:22,151 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@770] - Client attempting to renew session 0x236bc504cb50002 at /17.202.71.201:38166 2012-04-16 11:21:22,152 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:NIOServerCnxn@1580] - Established session 0x236bc504cb50002 with negotiated timeout 15000 for client /17.202.71.201:38166 2012-04-16 11:27:17,902 - INFO [ProcessThread:-1:PrepRequestProcessor@387] - Processed session termination for sessionid: 0x236bc504cb50002 2012-04-16 11:27:17,904 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1435] - Closed socket connection for client /17.202.71.201:38166 which had sessionid 0x236bc504cb50002 The only way I've been able to recover from this situation is to shut down the problem follower, delete its snapshots and let it resync with the leader. I'll attach the full log4j logs, the txn logs, the snapshots and the conf files for each member of the ensemble. Please let me know what other information is useful. |
236145 | No Perforce job exists for this issue. | 1 | 32552 | 6 years, 25 weeks, 1 day ago | 0|i05xnr: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1448 | Node+Quota creation in transaction log can crash leader startup |
Bug | Closed | Critical | Fixed | Flavio Paiva Junqueira | Botond Hejj | Botond Hejj | 17/Apr/12 12:15 | 13/Mar/14 14:17 | 05/Sep/13 17:50 | 3.3.5 | 3.4.6, 3.5.0 | server | 0 | 7 | Hi, I've found a bug in zookeeper related to quota creation which can shutdown zookeeper leader on startup. Steps to reproduce: 1. create /quota_bug 2. setquota -n 10000 /quota_bug 3. stop the whole ensemble (the previous operations should be in the transaction log) 4. start all the servers 5. the elected leader will shutdown with an exception (Missing stat node for count /zookeeper/quota/quota_bug/zookeeper_ stats) I've debugged a bit what happening and I found the following problem: On startup each server loads the last snapshot and replays the last transaction log. While doing this it fills up the pTrie variable of the DataTree with the path of the nodes which have quota. After the leader is elected the leader servers loads the snapshot and last transaction log but it doesn't clean up the pTrie variable. This means it still contains the "/quota_bug" path. Now when the "create /quota_bug" is processed from the transaction log the DataTree already thinks that the quota nodes ("/zookeeper/quota/quota_bug/zookeeper_limits" and "/zookeeper/quota/quota_bug/zookeeper_stats") are created but those node creation actually comes later in the transaction log. This leads to the missing stat node exception. I think clearing the pTrie should solve this problem. |
236123 | No Perforce job exists for this issue. | 7 | 32553 | 6 years, 2 weeks ago | 0|i05xnz: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1447 | Per-connection network throttling to improve QoS |
Improvement | Open | Major | Unresolved | Unassigned | Thawan Kooburat | Thawan Kooburat | 16/Apr/12 19:56 | 16/Apr/12 19:56 | 3.4.3 | server | 1 | 0 | Some clients maybe a heavy bandwidth user. It is possible for the total network traffic to reach NIC capacity and service quality start to degrade. We don't want these clients to affect the QoS of other clients sharing the same server. In this improvement, we are going to add per-connection throttling mechanism which will slow down the network activity of clients with high bandwidth usage. We will add configurable parameter to limit maximum bandwidth which will be used to serve client requests. All client get equal amount of bandwidth when system approach its network capacity limit. When the system in under-utilize, throttling has no effect. |
236011 | No Perforce job exists for this issue. | 0 | 41963 | 7 years, 49 weeks, 3 days ago | 0|i07jq7: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1446 | C API makes it difficult to implement a timed wait_until_connected method correctly |
Bug | Open | Minor | Unresolved | Unassigned | Stephen Tyree | Stephen Tyree | 12/Apr/12 15:59 | 30/May/12 16:30 | 3.4.3, 3.3.5 | c client | 0 | 1 | When using the C API, one might feel inclined to create a zookeeper_wait_until_connected method which waits for some amount for a connected state event to occur. The code might look like the following (didn't actually compile this): //------ static pthread_mutex_t kConnectedMutex = PTHREAD_MUTEX_INITIALIZER; static pthread_cond_t kConnectedCondvar = PTHREAD_COND_INITIALIZER; int zookeeper_wait_until_connected(zhandle_t* zk, const struct timespec* timeout) { struct timespec abstime; clock_gettime(TIMER_ABSTIME, &abstime); abstime->tv_sec += timeout->tv_sec; abstime->tv_nsec += timeout->tv_nsec; pthread_mutex_lock(&kConnectedMutex); if (zoo_state(zk) == ZOO_CONNECTED_STATE) { return 1; } pthread_cond_timedwait(&kConnectedCondvar, &kConnectedMutex, &abstime); int state = zoo_state(zk); return (state == ZOO_CONNECTED_STATE); } void zookeeper_session_callback(zhandle_t* zh, int type, int state, const char* path, void* arg) { pthread_mutex_lock(&kConnectedMutex); if (type == ZOO_SESSION_EVENT && state == ZOO_CONNECTED_STATE) { pthread_cond_broadcast(&kConnectedCondvar); } } //----- That would work fine (assuming I didn't screw anything up), except that pthread_cond_timedwait can spuriously wakeup, making you not actually wait the desired timeout. The solution to this is to loop until the condition is met, which might look like the following: //--- int state = zoo_state(zk); int result = 0; while ((state == ZOO_CONNECTING_STATE || state == ZOO_ASSOCIATING_STATE) && result != ETIMEDOUT) { result = pthread_cond_timedwait(&kConnectedCondvar, &kConnectedMutex, &abstime); state = zoo_state(zk); } //--- That would work fine, except the state might be valid and connecting, yet not ZOO_CONNECTING_STATE or ZOO_ASSOCIATING_STATE, it might be 0 or, as implemented recently courtesy of zookeeper-1108, 999. Checking for those states causes your code to rely upon an implementation detail of zookeeper, a problem highlighted by that implementation detail changing recently. Is there any way this behavior can become documented (via a ZOO_INITIALIZED_STATE or something like that), or is there any way this behavior can be supported by the library itself? |
235595 | No Perforce job exists for this issue. | 1 | 32554 | 7 years, 43 weeks, 1 day ago | 0|i05xo7: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1445 | Add support for binary data for zktreeutil |
Improvement | Open | Major | Unresolved | Thawan Kooburat | Thawan Kooburat | Thawan Kooburat | 10/Apr/12 13:20 | 05/Feb/20 07:16 | 3.4.3 | 3.7.0, 3.5.8 | contrib | 0 | 2 | zktreeutil does not support binary data. The program will fail to import/export znode's data which are in binary format. We are going to use OpenSSL library to perform Base64 encoding so that we can store it XML format. OpenSSL seems to be the only widely available library which as support for Base64 encoding and decoding. Libxml2 only have encoding support. | 235273 | No Perforce job exists for this issue. | 2 | 2512 | 5 years, 51 weeks, 3 days ago | 0|i00s9r: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1444 | Idle session-less connections never time out |
Bug | Resolved | Critical | Duplicate | Jay Shrauner | Jay Shrauner | Jay Shrauner | 09/Apr/12 21:08 | 27/Jul/12 01:17 | 27/Jul/12 01:17 | 3.3.2, 3.4.3, 3.5.0 | server | 0 | 2 | A socket connection to the server on which a session is not created will never time out. A misbehaving client that opens and leaks connections without creating sessions will hold open file descriptors on the server. The existing timeout code is implemented at the session level, but the servers also should track and expire connections at the connection level. Proposed solution is to pull the timeout data structure handling code (hashmap of expiry time to sets of objects, simple monotonically incrementing nextExpirationTime) from SessionTrackerImpl into its own class in order to share it with connection level timeouts to be implemented in NIOServerCnxnFactory. Connections can be assigned a small initial timeout (proposing something small, like 3s) until a session is created, at which point the ServerCnxn session timeout can be used instead. |
235167 | No Perforce job exists for this issue. | 2 | 32555 | 7 years, 38 weeks ago | Expire idle connections. | 0|i05xof: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1443 | API docs for trunk returns 404 |
Bug | Resolved | Major | Duplicate | Patrick D. Hunt | Michi Mutsuzaki | Michi Mutsuzaki | 04/Apr/12 19:05 | 09/Oct/13 02:46 | 09/Oct/13 02:46 | documentation | 0 | 0 | The "API Docs" link is broken in trunk. http://zookeeper.apache.org/doc/trunk/ http://zookeeper.apache.org/doc/trunk/api/index.html |
234575 | No Perforce job exists for this issue. | 0 | 32556 | 6 years, 24 weeks, 1 day ago | 0|i05xon: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1442 | Uncaught exception handler should exit on a java.lang.Error |
Bug | Open | Minor | Unresolved | Jeremy Stribling | Jeremy Stribling | Jeremy Stribling | 04/Apr/12 13:09 | 29/Jul/17 10:34 | 3.4.3, 3.3.5 | java client, server | 0 | 3 | The uncaught exception handler registered in NIOServerCnxnFactory and ClientCnxn simply logs exceptions and lets the rest of ZooKeeper go on its merry way. However, errors such as OutOfMemoryErrors should really crash the program, as they represent unrecoverable errors. If the exception that gets to the uncaught exception handler is an instanceof a java.lang.Error, ZK should exit with an error code (in addition to logging the error). | 234532 | No Perforce job exists for this issue. | 3 | 32557 | 2 years, 33 weeks, 5 days ago | 0|i05xov: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1441 | Some test cases are failing because Port bind issue. |
Test | Closed | Major | Fixed | Andor Molnar | kavita sharma | kavita sharma | 03/Apr/12 08:15 | 20/May/19 13:50 | 23/Nov/18 05:52 | 3.6.0, 3.5.5 | server, tests | 0 | 4 | 0 | 13800 | ZOOKEEPER-2135 | very frequently testcases are failing because of : java.net.BindException: Address already in use at sun.nio.ch.Net.bind(Native Method) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:126) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:52) at org.apache.zookeeper.server.NIOServerCnxnFactory.configure(NIOServerCnxnFactory.java:111) at org.apache.zookeeper.server.ServerCnxnFactory.createFactory(ServerCnxnFactory.java:112) at org.apache.zookeeper.server.quorum.QuorumPeer.<init>(QuorumPeer.java:514) at org.apache.zookeeper.test.QuorumBase.startServers(QuorumBase.java:156) at org.apache.zookeeper.test.QuorumBase.setUp(QuorumBase.java:103) at org.apache.zookeeper.test.QuorumBase.setUp(QuorumBase.java:67) may be because of Port Assignment so please give me some suggestions if someone is also facing same problem. |
100% | 100% | 13800 | 0 | flaky, flaky-test, pull-request-available | 234308 | No Perforce job exists for this issue. | 0 | 32558 | 1 year, 16 weeks, 6 days ago |
Reviewed
|
0|i05xp3: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1440 | Spurious log error messages when QuorumCnxManager is shutting down |
Bug | Resolved | Minor | Fixed | Jordan Zimmerman | Jordan Zimmerman | Jordan Zimmerman | 01/Apr/12 16:28 | 12/Mar/14 19:32 | 12/Mar/14 17:45 | 3.4.3 | 3.5.0 | quorum, server | 0 | 5 | When shutting down the QuroumPeer, ZK server logs unnecessary errors. See QuorumCnxManager.Listener.run() - ss.accept() will throw an exception when it is closed. The catch (IOException e) will log errors. It should first check the shutdown field to see if the Listener is being shutdown. If it is, the exception is correct and no errors should be logged. | 234094 | No Perforce job exists for this issue. | 2 | 32559 | 6 years, 2 weeks, 1 day ago | 0|i05xpb: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1439 | c sdk: core in log_env for lack of checking the output argument *pwp* of getpwuid_r |
Bug | Resolved | Major | Fixed | Yubing Yin | Yubing Yin | Yubing Yin | 01/Apr/12 04:15 | 27/Apr/12 07:00 | 26/Apr/12 06:59 | 3.4.3, 3.3.5 | 3.5.0 | c client | 1 | 1 | 3600 | 3600 | 0% | linux | Man of getpwuid_r "return a pointer to a passwd structure, or NULL if the matching entry is not found or an error occurs", "The getpwnam_r() and getpwuid_r() functions return zero on success.", it means entry may not be found when getpwuid_r success. In log_env of zookeeper.c in c sdk: {{if (!getpwuid_r(uid, &pw, buf, sizeof(buf), &pwp)) {}} {{LOG_INFO(("Client environment:user.home=%s", pw.pw_dir));}} {{}}} pwp is not checked to ensure entry is found, pw.pw_dir is not initialized in this case, core happens in LOG_INFO. |
0% | 0% | 3600 | 3600 | zookeeper | 234062 | No Perforce job exists for this issue. | 1 | 32560 | 7 years, 47 weeks, 6 days ago |
Reviewed
|
sdk | 0|i05xpj: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1438 | JMX MBeans for client connections can be orphaned |
Bug | Open | Minor | Unresolved | Unassigned | Todd Lipcon | Todd Lipcon | 30/Mar/12 00:49 | 30/Mar/12 00:49 | 3.4.2 | jmx | 0 | 2 | I have a functional test that extends from ClientBase, which I'm using to stress test a piece of software that uses ZK underneath. In this test, I want to simulate disconnection events, so I fire up a thread which calls "serverFactory.closeAll()" every 50ms. The clients themselves churn through a lot of sessions as part of the test. When the test completes, the ClientBase teardown method fails, since it sees one or two MBeans "left over" from earlier elapsed sessions. | 233878 | No Perforce job exists for this issue. | 0 | 32561 | 7 years, 51 weeks, 6 days ago | 0|i05xpr: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1437 | Client uses session before SASL authentication complete |
Bug | Resolved | Major | Fixed | Eugene Joseph Koontz | Thomas Weise | Thomas Weise | 28/Mar/12 22:09 | 18/Feb/16 07:31 | 09/Sep/12 14:24 | 3.4.3 | 3.4.4, 3.5.0 | java client | 0 | 19 | ZOOKEEPER-1455, ZOOKEEPER-1561, HBASE-5780, HBASE-7771, ZOOKEEPER-1764, ZOOKEEPER-938, ZOOKEEPER-1547, HADOOP-8315 | Found issue in the context of hbase region server startup, but can be reproduced w/ zkCli alone. getData may occur prior to SaslAuthenticated and fail with NoAuth. This is not expected behavior when the client is configured to use SASL. |
233695 | No Perforce job exists for this issue. | 18 | 32562 | 4 years, 5 weeks ago |
Reviewed
|
0|i05xpz: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1436 | Add ZOO_TIMED_OUT_STATE sesion event to notify client about timeout during reconnection |
Improvement | Open | Major | Unresolved | Thawan Kooburat | Thawan Kooburat | Thawan Kooburat | 28/Mar/12 21:07 | 31/May/12 21:41 | 3.4.3 | c client | 1 | 3 | The zookeeper c client knows how long its session will last, and periodically pings in order to keep that session alive. However, if it loses connection, it hops from ensemble member to ensemble member trying to reform the session - even after the session timeout expires. This patch at a new session event (ZOO_TIMED_OUT_STATE) that notifies the user that the session timeout has passed, and we have been unable to reconnect. The event is one-shot per disconnection and get generated from the C-client library itself. The server has no knowledge of this event. Example use cases: 1. Client can try to reconnect to a different set of observers if it unable to connect to the original set of observers. 2. Client can quickly stop acting as an active server, since other server may already taken over the active role while it is trying to reconnect. |
patch | 233691 | No Perforce job exists for this issue. | 2 | 41964 | 7 years, 42 weeks, 6 days ago | 0|i07jqf: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1435 | cap space usage of default log4j rolling policy |
Improvement | Resolved | Major | Fixed | Patrick D. Hunt | Patrick D. Hunt | Patrick D. Hunt | 28/Mar/12 12:33 | 29/Mar/12 21:08 | 29/Mar/12 21:08 | 3.4.3, 3.3.5, 3.5.0 | 3.5.0 | scripts | 0 | 0 | HADOOP-8149 | Our current log4j log rolling policy (for ROLLINGFILE) doesn't cap the max logging space used. This can be a problem in production systems. See similar improvements recently made in hadoop: HADOOP-8149 For ROLLINGFILE only, I believe we should change the default threshold to INFO and cap the max space to something reasonable, say 5g (max file size of 256mb, max file count of 20). These will be the defaults in log4j.properties, which you would also be able to override from the command line. |
233615 | No Perforce job exists for this issue. | 1 | 12505 | 7 years, 51 weeks, 6 days ago |
Reviewed
|
0|i02hxb: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1434 | zkCli crashes with NPE on stat of non-existent path |
Bug | Resolved | Major | Won't Fix | Hartmut Lang | Wing Yew Poon | Wing Yew Poon | 26/Mar/12 20:06 | 26/Mar/12 20:41 | 26/Mar/12 20:39 | 3.3.5 | java client | 0 | 0 | In the command line client (zkCli.sh), when I do {noformat} stat /non-existent {noformat} the client crashes with {noformat} Exception in thread "main" java.lang.NullPointerException at org.apache.zookeeper.ZooKeeperMain.printStat(ZooKeeperMain.java:130) at org.apache.zookeeper.ZooKeeperMain.processZKCmd(ZooKeeperMain.java:722) at org.apache.zookeeper.ZooKeeperMain.processCmd(ZooKeeperMain.java:581) at org.apache.zookeeper.ZooKeeperMain.executeLine(ZooKeeperMain.java:353) at org.apache.zookeeper.ZooKeeperMain.run(ZooKeeperMain.java:311) at org.apache.zookeeper.ZooKeeperMain.main(ZooKeeperMain.java:270) {noformat} |
233306 | No Perforce job exists for this issue. | 1 | 32563 | 8 years, 2 days ago | 0|i05xq7: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1433 | improve ZxidRolloverTest (test seems flakey) |
Improvement | Resolved | Major | Fixed | Patrick D. Hunt | Wing Yew Poon | Wing Yew Poon | 26/Mar/12 16:00 | 30/Mar/12 07:07 | 29/Mar/12 21:08 | 3.3.5 | 3.3.6, 3.4.4, 3.5.0 | tests | 0 | 0 | In our jenkins job to run the ZooKeeper unit tests, org.apache.zookeeper.server.ZxidRolloverTest sometimes fails. E.g., {noformat} org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /foo0 at org.apache.zookeeper.KeeperException.create(KeeperException.java:90) at org.apache.zookeeper.KeeperException.create(KeeperException.java:42) at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:815) at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:843) at org.apache.zookeeper.server.ZxidRolloverTest.checkNodes(ZxidRolloverTest.java:154) at org.apache.zookeeper.server.ZxidRolloverTest.testRolloverThenRestart(ZxidRolloverTest.java:211) {noformat} |
233273 | No Perforce job exists for this issue. | 2 | 12507 | 7 years, 51 weeks, 6 days ago |
Reviewed
|
0|i02hxr: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1432 | Add javadoc and debug logging for checkACL() method in PrepRequestProcessor |
Improvement | Resolved | Major | Fixed | Eugene Joseph Koontz | Eugene Joseph Koontz | Eugene Joseph Koontz | 23/Mar/12 19:04 | 26/Apr/12 04:38 | 26/Apr/12 04:38 | 3.5.0 | 3.5.0 | server | 0 | 0 | I have a need for more logging in the checkACL() method and thought I'd add a javadoc section for the function too, while I am there. | security | 233003 | No Perforce job exists for this issue. | 4 | 33280 | 7 years, 49 weeks, 5 days ago | 0|i0625j: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1431 | zkpython: async calls leak memory |
Bug | Resolved | Major | Fixed | Kapil Thangavelu | johan rydberg | johan rydberg | 23/Mar/12 03:11 | 19/Jun/12 07:00 | 18/Jun/12 20:24 | 3.4.3 | 3.3.6, 3.4.4, 3.5.0 | contrib-bindings | 1 | 7 | 3600 | 3600 | 0% | RHEL 6.0, self-built from 3.3.3 sources | I'm seeing a memory leakage when using the "aget" method. It leaks tuples and dicts, both containing "stats". |
0% | 0% | 3600 | 3600 | 232854 | No Perforce job exists for this issue. | 4 | 32564 | 7 years, 40 weeks, 2 days ago | 0|i05xqf: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1430 | add maven deploy support to the build |
Task | Closed | Blocker | Fixed | Giridharan Kesavan | Patrick D. Hunt | Patrick D. Hunt | 22/Mar/12 13:28 | 13/Mar/14 14:17 | 19/Dec/13 20:30 | 3.4.4, 3.5.0 | 3.4.6, 3.5.0 | build | 0 | 6 | ZOOKEEPER-1686, INFRA-4565 | Infra is phasing out the current mechanism we use to deploy maven artifacts. We need to move to repository.apache.org (nexus). In particular we need to ensure the following artifacts get published: * zookeeper-3.x.y.jar * zookeeper-3.x.y-sources.jar * zookeeper-3.x.y-tests.jar * zookeeper-3.x.y-javadoc.jar In 3.4.4/3.4.5 we missed the tests jar which caused headaches for downstream users, such as Hadoop. |
232736 | No Perforce job exists for this issue. | 13 | 41965 | 6 years, 2 weeks ago |
Reviewed
|
0|i07jqn: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1429 | Response packet caching for get request |
Improvement | Open | Minor | Unresolved | Unassigned | Thawan Kooburat | Thawan Kooburat | 21/Mar/12 13:58 | 21/Mar/12 13:58 | 3.4.3 | server | 1 | 1 | Motivation: In our scalability testing, we have a large number of clients watching for data changes. All of them fetch data immediately when a watch is fired. We found that GC consumes significant amount of CPU time in this scenario. In our prototype, we added packet caching for getData() request and found that GC time reduced by 40%. GC that we used is Concurrent Mark Sweep Collector. Design and Implementation: Similar to our prototype, we plan to add packet caching for getData() request using LRU caching. The cache stores serializes response (data + stat) in form of ByteBuffer indexed by its pathname. The cache entry is invalidated when there is a set request that affect the data. The data structure that we plan to use for LRU cache is CacheBuilder [http://docs.guava-libraries.googlecode.com/git/javadoc/com/google/common/cache/CacheBuilder.html] because it provides many tunable features that we try and use in the future. Currently, the eviction policy will be based on memory size. Otherwise, we can implement it using LinkedHashMap if we do not want to rely on external library. |
performance | 232580 | No Perforce job exists for this issue. | 0 | 41966 | 8 years, 1 week, 1 day ago | 0|i07jqv: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1428 | Create command line tool to utilize the new classes introduced in ZOOKEEPER-271 |
New Feature | Open | Major | Unresolved | Unassigned | Ted Yu | Ted Yu | 20/Mar/12 13:01 | 08/Jul/13 13:27 | java client, scripts | 0 | 3 | See discussion entitled 'ZOOKEEPER-1059 Was: Does the rolling-restart.sh script work?' on zookeerper-dev HBase bin/rolling-restart.sh depends on zkcli returning non-zero exit code for non-existing znode. Jonathan Hsieh found that rolling-restart.sh no longer works using zookeeper 3.4.x From Patrick Hunt: I think what we need is to have a tool that's intended for use both programmatically and by humans, with more strict requirements about input, output formatting and command handling, etc... Please see the work Hartmut has been doing as part of 271 on trunk (3.5.0). Perhaps we can augment these new classes to also support such a tool. However it should instead be a true command line tool, rather than a shell. |
newbie | 232393 | No Perforce job exists for this issue. | 0 | 41967 | 6 years, 37 weeks, 3 days ago | 0|i07jr3: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1427 | Writing to local files is done non-atomically |
Bug | Resolved | Critical | Fixed | Patrick D. Hunt | Todd Lipcon | Todd Lipcon | 19/Mar/12 17:25 | 18/Jul/12 07:01 | 17/Jul/12 17:22 | 3.4.3 | 3.4.4, 3.5.0 | server | 0 | 10 | Currently, the writeLongToFile() function opens the file for truncate, writes the new data, syncs, and then closes. If the process crashes after opening the file but before writing the new data, the file may be left empty, causing ZK to "forget" an earlier promise. Instead, it should use RandomAccessFile to avoid truncating. | 232251 | No Perforce job exists for this issue. | 6 | 12503 | 7 years, 36 weeks, 1 day ago |
Reviewed
|
0|i02hwv: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1426 | add version command to the zookeeper server |
Improvement | Resolved | Major | Fixed | Peter Szecsi | Patrick D. Hunt | Patrick D. Hunt | 18/Mar/12 02:34 | 21/Jun/19 07:22 | 31/May/19 16:16 | 3.3.5 | 3.6.0 | scripts, server | 1 | 6 | 0 | 11400 | Add a version command to the zkServer.sh. Hadoop does this by having a special main class: org.apache.hadoop.util.VersionInfo We could do something similar, hook it into our current version information class (perhaps add main to that class). Would also need to add a new "version" command to zkServer.sh that calls this. |
100% | 100% | 11400 | 0 | newbie, patch, pull-request-available | 232104 | No Perforce job exists for this issue. | 4 | 2511 | 41 weeks, 6 days ago | 0|i00s9j: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1425 | add version command to the zookeeper client shell |
Improvement | Resolved | Major | Fixed | maoling | Patrick D. Hunt | Patrick D. Hunt | 18/Mar/12 02:31 | 20/May/19 22:36 | 20/May/19 16:55 | 3.6.0 | java client, scripts | 0 | 2 | 0 | 2400 | the client shell is missing a version command. Should return the version e.g. "3.5.0" | 100% | 100% | 2400 | 0 | pull-request-available | 232103 | No Perforce job exists for this issue. | 1 | 41968 | 43 weeks, 2 days ago | 0|i07jrb: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1424 | ZooKeeper will not allow a client to delete a tree when it should allow it |
Bug | Open | Major | Unresolved | Unassigned | Mihai Claudiu Toader | Mihai Claudiu Toader | 16/Mar/12 17:15 | 06/Oct/14 09:47 | 3.4.2 | server | 0 | 3 | ZOOKEEPER-2052 | Linux ubuntu 11.10, Zookeeper 3.4.2, One server, Two Java clients | Hi all, While using zookeeper at midokura we hit an interesting bug in zookeeper. We did hit it sporadically while developing some functional tests so i had to build a test case for it. I finally created the test case and i think i narrowed down the conditions under which it happens. So i wanted to let you know my findings since they are somewhat troublesome. We need: - one running zookeeper server (didn't test that with a cluster) let's name this: server - one running zookeeper client that will create an ephemeral node under the tree created by the next client let's name this: the ephemeral client - one running zookeeper client that will create a persistent tree and try to delete that tree let's name this: the persistent client What needs to happen is this: step 1. - the server starts step 2. - the persistent client connects and creates a tree step 3. - the ephemeral client connects and adds a ephemeral node under the tree created by the persistent client step 4. - the persistent client will try to delete the tree recursively (without including the ephemeral node in the multi op step 5. - the ephemeral client crashes hard (the equivalent of kill -9) step 6. - the persistent client will try to delete the tree recursively again (and fail with NoEmptyNode even if when we list the node we don't see any childrens) - the zookeeper server needs to be restarted in order for this to work. The step 4 is critical in the sense that if we don't have that (there is no previous error trying to remove a tree) then the nexts steps behave as we would expect them to behave (aka pass). Also no amount of fiddling with zookeeper connection timeouts (between zookeeper and ephemeral node) will help. If the ephemeral client is shutdown properly it seems like everything will behave properly (even with step 4). The test code is available here: https://github.com/mtoadermido/play It needs an zookeepr 3.4.2 installed on the system (it uses the installed jars from the deb to spawn the zookeeper server). The entry point is https://github.com/mtoadermido/play/blob/master/src/main/java/com/midokura/tests/zookeeper/BlockingBug.java There is a lot of boiler plate since i didn't want it to be depending on stuff from midonet but the interesting part is the BlockingBug.main() method. It will launch a zookeeper process, an external ephemeral client process, and after that act as the second client. Available tweaks: - the zookeeper client timeout for the ephemeral client here: https://github.com/mtoadermido/play/blob/master/src/main/java/com/midokura/tests/zookeeper/BlockingBug.java#L56 - the step 4 here (set to true / false): https://github.com/mtoadermido/play/blob/master/src/main/java/com/midokura/tests/zookeeper/BlockingBug.java#L69 - the shutdown of the ephemeral client (soft aka clean shutdown, hard aka kill -9): https://github.com/mtoadermido/play/blob/master/src/main/java/com/midokura/tests/zookeeper/BlockingBug.java#L88 The result is displayed depending on the fact that the final recursive deletion succeeded or not: We hit it !. The clear tree failed. https://github.com/mtoadermido/play/blob/master/src/main/java/com/midokura/tests/zookeeper/BlockingBug.java#L103 "No error :(" https://github.com/mtoadermido/play/blob/master/src/main/java/com/midokura/tests/zookeeper/BlockingBug.java#L99 The conclusion is that the bug seems to be inside the zookeeper codebase and it's prone to being triggered by this particular usage of zookeeper combined with the misfortune of having to kill the ephemeral process hard. |
232004 | No Perforce job exists for this issue. | 1 | 32565 | 8 years, 1 week, 2 days ago | 0|i05xqn: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1423 | 4lw and jmx should expose the size of the datadir/datalogdir |
Improvement | Resolved | Major | Fixed | Edward Ribeiro | Patrick D. Hunt | Patrick D. Hunt | 16/Mar/12 13:03 | 13/Jul/15 00:04 | 13/Jul/15 00:04 | 3.5.0 | 3.5.1, 3.6.0 | jmx | 0 | 5 | There are no metrics currently available on the size of the datadir/datalogdir. These grow w/o bound unless the cleanup script is run. It would be good to expose these metrics through jmx/4lw such that monitoring can be done on the size. Would key ppl in on whether cleanup was actually running. In particular this could be monitored/alerted on by third party systems (nagios, ganglia and the like). | newbie | 231983 | No Perforce job exists for this issue. | 8 | 41969 | 4 years, 36 weeks, 3 days ago | 0|i07jrj: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1422 | Support _HOST substitution in JAAS configuration |
Improvement | Resolved | Major | Implemented | Mark Fenes | Thomas Weise | Thomas Weise | 15/Mar/12 22:03 | 18/Jan/18 11:54 | 18/Jan/18 11:54 | 3.4.0 | 2 | 4 | ZOOKEEPER-938, HBASE-4791 | At the moment a JAAS configuration file needs to be created with the Kerberos principal specified as user/host. It would be much easier for deployment automation if the host portion could be resolved at startup time, as supported in Hadoop (something like user/_HOST instead of user/hostname). A configuration alternative to global JAAS conf would be even better (via direct properties in zoo.cfg?). |
231864 | No Perforce job exists for this issue. | 0 | 41970 | 2 years, 9 weeks ago | 0|i07jrr: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1421 | Support for hierarchical ACLs |
Improvement | Open | Major | Unresolved | Unassigned | Thomas Weise | Thomas Weise | 15/Mar/12 21:47 | 12/Apr/15 20:07 | server | 3 | 7 | Using ZK as a service, we need to restrict access to subtrees owned by different tenants. Currently there is no support for hierarchical ACLs, so it is necessary to configure the clients not only with their parent node, but also manage the ACL for each new node created in the subtree. With support for hierarchical ACLs, duplication could be avoided and the setup of the parent nodes with ACL and subsequent control of the same could be split into a separate administrative task. |
231862 | No Perforce job exists for this issue. | 0 | 41971 | 4 years, 49 weeks, 4 days ago | 0|i07jrz: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1420 | Kerberos principal to user mapping / authorization |
Improvement | Open | Major | Unresolved | Unassigned | Thomas Weise | Thomas Weise | 15/Mar/12 21:21 | 17/May/12 14:41 | 3.4.0 | server | 1 | 3 | ZOOKEEPER-938, ZOOKEEPER-1467 | ZOOKEEPER-938 introduces server configuration options to perform a rudimentary mapping from Kerberos principal to user name: kerberos.removeHostFromPrincipal kerberos.removeRealmFromPrincipal Those are sufficient to make things work for HBase and other server clusters where we cannot include the host name portion into the znode ACL, but it would be better to support a more standard approach to perform the mapping with finer grained control (i.e. do this only for specific matching principals). Mapping in Hadoop: https://ccp.cloudera.com/display/CDHDOC/Appendix+C+-+Configuring+the+Mapping+from+Kerberos+Principals+to+Short+Names As an alternative, a matching option at the time of ACL check that can be controlled by the process assigning ACLs to znodes could also serve the purpose. For example, principals: user/host1@TEST.DOMAIN user/host2@TEST.DOMAIN would have access to a znode with ACL set as: sasl:user/host*@TEST.DOMAIN:cdrwa This would not require ZK server configuration, but add more runtime overhead. |
231860 | No Perforce job exists for this issue. | 0 | 41972 | 8 years, 1 week, 6 days ago | 0|i07js7: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1419 | Leader election never settles for a 5-node cluster |
Bug | Resolved | Blocker | Fixed | Flavio Paiva Junqueira | Jeremy Stribling | Jeremy Stribling | 15/Mar/12 20:07 | 19/Mar/12 21:19 | 19/Mar/12 21:19 | 3.4.3, 3.5.0 | 3.4.4, 3.5.0 | leaderElection | 0 | 1 | 64-bit Linux, all nodes running on the same machine (different ports) | We have a situation where it seems to my untrained eye that leader election never finishes for a 5-node cluster. In this test, all nodes are ZK 3.4.3 and running on the same server (listening on different ports, of course). The nodes have server IDs of 0, 1, 2, 3, 4. The test brings up the cluster in different configurations, adding in a new node each time. We embed ZK in our application, so when we shut a node down and restart it with a new configuration, it all happens in a single JVM process. Here's our server startup code (for the case where there's more than one node in the cluster): {code} if (servers.size() > 1) { _log.debug("Starting Zookeeper server in quorum server mode"); _quorum_peer = new QuorumPeer(); synchronized(_quorum_peer) { _quorum_peer.setClientPortAddress(clientAddr); _quorum_peer.setTxnFactory(log); _quorum_peer.setQuorumPeers(servers); _quorum_peer.setElectionType(_election_alg); _quorum_peer.setMyid(_server_id); _quorum_peer.setTickTime(_tick_time); _quorum_peer.setInitLimit(_init_limit); _quorum_peer.setSyncLimit(_sync_limit); QuorumVerifier quorumVerifier = new QuorumMaj(servers.size()); _quorum_peer.setQuorumVerifier(quorumVerifier); _quorum_peer.setCnxnFactory(_cnxn_factory); _quorum_peer.setZKDatabase(new ZKDatabase(log)); _quorum_peer.start(); } } else { _log.debug("Starting Zookeeper server in single server mode"); _zk_server = new ZooKeeperServer(); _zk_server.setTxnLogFactory(log); _zk_server.setTickTime(_tick_time); _cnxn_factory.startup(_zk_server); } {code} And here's our shutdown code: {code} if (_quorum_peer != null) { synchronized(_quorum_peer) { _quorum_peer.shutdown(); FastLeaderElection fle = (FastLeaderElection) _quorum_peer.getElectionAlg(); fle.shutdown(); try { _quorum_peer.getTxnFactory().commit(); } catch (java.nio.channels.ClosedChannelException e) { // ignore } } } else { _cnxn_factory.shutdown(); _zk_server.getTxnLogFactory().commit(); } {code} The test steps through the following scenarios in quick succession: Run 1: Start a 1-node cluster, servers=[0] Run 2: Start a 2-node cluster, servers=[0,3] Run 3: Start a 3-node cluster, servers=[0,1,3] Run 4: Start a 4-node cluster, servers=[0,1,2,3] Run 5: Start a 5-node cluster, servers=[0,1,2,3,4] It appears that run 5 never elects a leader -- the nodes just keep spewing messages like this (example from node 0): {noformat} 2012-03-14 16:23:12,775 13308 [WorkerSender[myid=0]] DEBUG org.apache.zookeeper.server.quorum.QuorumCnxManager - There is a connection already for server 2 2012-03-14 16:23:12,776 13309 [QuorumPeer[myid=0]/127.0.0.1:2900] DEBUG org.apache.zookeeper.server.quorum.FastLeaderElection - Sending Notification: 3 (n.leader), 0x0 (n.zxid), 0x1 (n.round), 3 (recipient), 0 (myid), 0x2 (n.peerEpoch) 2012-03-14 16:23:12,776 13309 [WorkerSender[myid=0]] DEBUG org.apache.zookeeper.server.quorum.QuorumCnxManager - There is a connection already for server 3 2012-03-14 16:23:12,776 13309 [QuorumPeer[myid=0]/127.0.0.1:2900] DEBUG org.apache.zookeeper.server.quorum.FastLeaderElection - Sending Notification: 3 (n.leader), 0x0 (n.zxid), 0x1 (n.round), 4 (recipient), 0 (myid), 0x2 (n.peerEpoch) 2012-03-14 16:23:12,776 13309 [WorkerSender[myid=0]] DEBUG org.apache.zookeeper.server.quorum.QuorumCnxManager - There is a connection already for server 4 2012-03-14 16:23:12,776 13309 [WorkerReceiver[myid=0]] DEBUG org.apache.zookeeper.server.quorum.FastLeaderElection - Receive new notification message. My id = 0 2012-03-14 16:23:12,776 13309 [WorkerReceiver[myid=0]] INFO org.apache.zookeeper.server.quorum.FastLeaderElection - Notification: 4 (n.leader), 0x0 (n.zxid), 0x1 (n.round), LOOKING (n.state), 1 (n.sid), 0x0 (n.peerEPoch), LOOKING (my state) 2012-03-14 16:23:12,776 13309 [WorkerReceiver[myid=0]] DEBUG org.apache.zookeeper.server.quorum.FastLeaderElection - Receive new notification message. My id = 0 2012-03-14 16:23:12,776 13309 [WorkerReceiver[myid=0]] INFO org.apache.zookeeper.server.quorum.FastLeaderElection - Notification: 3 (n.leader), 0x0 (n.zxid), 0x1 (n.round), LOOKING (n.state), 2 (n.sid), 0x2 (n.peerEPoch), LOOKING (my state) 2012-03-14 16:23:12,776 13309 [QuorumPeer[myid=0]/127.0.0.1:2900] DEBUG org.apache.zookeeper.server.quorum.FastLeaderElection - Adding vote: from=1, proposed leader=3, proposed zxid=0x0, proposed election epoch=0x1 2012-03-14 16:23:12,776 13309 [QuorumPeer[myid=0]/127.0.0.1:2900] DEBUG org.apache.zookeeper.server.quorum.FastLeaderElection - id: 3, proposed id: 3, zxid: 0x0, proposed zxid: 0x0 2012-03-14 16:23:12,776 13309 [QuorumPeer[myid=0]/127.0.0.1:2900] DEBUG org.apache.zookeeper.server.quorum.FastLeaderElection - id: 3, proposed id: 3, zxid: 0x0, proposed zxid: 0x0 2012-03-14 16:23:12,776 13309 [QuorumPeer[myid=0]/127.0.0.1:2900] DEBUG org.apache.zookeeper.server.quorum.FastLeaderElection - id: 3, proposed id: 3, zxid: 0x0, proposed zxid: 0x0 2012-03-14 16:23:12,776 13309 [QuorumPeer[myid=0]/127.0.0.1:2900] DEBUG org.apache.zookeeper.server.quorum.FastLeaderElection - id: 4, proposed id: 3, zxid: 0x0, proposed zxid: 0x0 2012-03-14 16:23:12,776 13309 [QuorumPeer[myid=0]/127.0.0.1:2900] DEBUG org.apache.zookeeper.server.quorum.FastLeaderElection - id: 4, proposed id: 3, zxid: 0x0, proposed zxid: 0x0 {noformat} I'm guessing this means that nodes 3 and 4 are fighting over leadership, but I don't know enough about the leader election code to debug this any further. Attaching a tarball with the logs for each run and the data directories for each node (though I don't think any data is being written to ZK during the test). |
231852 | No Perforce job exists for this issue. | 4 | 12509 | 8 years, 1 week, 2 days ago | 0|i02hy7: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1418 | Just a bug in the tutorial code on the website |
Bug | Open | Minor | Unresolved | Joe Gamache | Joe Gamache | Joe Gamache | 15/Mar/12 17:11 | 22/Mar/12 10:49 | 3.4.3 | documentation | 0 | 0 | When I ran the Queue example from here: http://zookeeper.apache.org/doc/trunk/zookeeperTutorial.html The producer created entries of the form: /app1/element0000000001... but the consumer tried to consume of the form: /app1/element1... adding a patch with the file attached. |
231819 | No Perforce job exists for this issue. | 1 | 32566 | 8 years, 1 week ago | 0|i05xqv: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1417 | investigate differences in client last zxid handling btw c and java clients |
Bug | Resolved | Major | Fixed | Thawan Kooburat | Patrick D. Hunt | Patrick D. Hunt | 15/Mar/12 13:05 | 06/Jun/13 13:21 | 06/Jun/13 12:54 | 3.4.0 | 3.5.0 | c client, java client | 0 | 5 | ZOOKEEPER-1412 | In ZOOKEEPER-1412 it was identified that the c and java clients handle updating the last zxid seen a bit differently. ZOOKEEPER-1412 fixed a bug associated with this, however there are still some differences that should be investigated. | 231776 | No Perforce job exists for this issue. | 2 | 32567 | 6 years, 42 weeks ago | 0|i05xr3: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1416 | Persistent Recursive Watch |
Improvement | Resolved | Major | Fixed | Jordan Zimmerman | Phillip Liu | Phillip Liu | 14/Mar/12 18:52 | 11/Nov/19 21:24 | 08/Nov/19 11:30 | 3.6.0 | c client, documentation, java client, server | 22 | 30 | 1814400 | 1750800 | 63600 | 3% | ZOOKEEPER-2871 | ZOOKEEPER-3611 | h4. The Problem A ZooKeeper Watch can be placed on a single znode and when the znode changes a Watch event is sent to the client. If there are thousands of znodes being watched, when a client (re)connect, it would have to send thousands of watch requests. At Facebook, we have this problem storing information for thousands of db shards. Consequently a naming service that consumes the db shard definition issues thousands of watch requests each time the service starts and changes client watcher. h4. Proposed Solution We add the notion of a Persistent Recursive Watch in ZooKeeper. Persistent means no Watch reset is necessary after a watch-fire. Recursive means the Watch applies to the node and descendant nodes. A Persistent Recursive Watch behaves as follows: # Recursive Watch supports all Watch semantics: CHILDREN, DATA, and EXISTS. # CHILDREN and DATA Recursive Watches can be placed on any znode. # EXISTS Recursive Watches can be placed on any path. # A Recursive Watch behaves like a auto-watch registrar on the server side. Setting a Recursive Watch means to set watches on all descendant znodes. # When a watch on a descendant fires, no subsequent event is fired until a corresponding getData(..) on the znode is called, then Recursive Watch automically apply the watch on the znode. This maintains the existing Watch semantic on an individual znode. # A Recursive Watch overrides any watches placed on a descendant znode. Practically this means the Recursive Watch Watcher callback is the one receiving the event and event is delivered exactly once. A goal here is to reduce the number of semantic changes. The guarantee of no intermediate watch event until data is read will be maintained. The only difference is we will automatically re-add the watch after read. At the same time we add the convience of reducing the need to add multiple watches for sibling znodes and in turn reduce the number of watch messages sent from the client to the server. There are some implementation details that needs to be hashed out. Initial thinking is to have the Recursive Watch create per-node watches. This will cause a lot of watches to be created on the server side. Currently, each watch is stored as a single bit in a bit set relative to a session - up to 3 bits per client per znode. If there are 100m znodes with 100k clients, each watching all nodes, then this strategy will consume approximately 3.75TB of ram distributed across all Observers. Seems expensive. Alternatively, a blacklist of paths to not send Watches regardless of Watch setting can be set each time a watch event from a Recursive Watch is fired. The memory utilization is relative to the number of outstanding reads and at worst case it's 1/3 * 3.75TB using the parameters given above. Otherwise, a relaxation of no intermediate watch event until read guarantee is required. If the server can send watch events regardless of one has already been fired without corresponding read, then the server can simply fire watch events without tracking. |
3% | 3% | 63600 | 1750800 | 1814400 | pull-request-available | 231654 | No Perforce job exists for this issue. | 2 | 41973 | 18 weeks, 6 days ago | 0|i07jsf: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1415 | Zookeeper broadcasts host's hostname instead of IP when ecf.exported.containerfactoryargs property is not set |
Bug | Open | Minor | Unresolved | Unassigned | Stefano Ghio | Stefano Ghio | 14/Mar/12 15:06 | 14/Mar/12 15:06 | 2 | 1 | Any OS, any Java version. The issue presents itself when using the osgi bundles org.apache.hadoop.zookeeper and org.eclipse.ecf.provider.zookeeper inside an Eclipse Equinox framework. I did not test on any other versions. | Not setting the ecf.exported.containerfactoryargs property when publishing an OSGi service through Zookeeper results in the service being published under the host's hostname instead of its IP. This means that hosts not able to correctly resolve that hostname cannot connect to its ZooKeeper instance. It would be desirable to use the IP instead of the hostname when that property is purposely left blank e.g. when it is unknown where the application will be deployed. | osgi | 231619 | No Perforce job exists for this issue. | 0 | 32568 | 8 years, 2 weeks, 1 day ago | osgi | 0|i05xrb: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1414 | ZOOKEEPER-1833 QuorumPeerMainTest.testQuorum, testBadPackets are failing intermittently |
Sub-task | Closed | Minor | Fixed | Rakesh Radhakrishnan | Rakesh Radhakrishnan | Rakesh Radhakrishnan | 14/Mar/12 10:48 | 13/Mar/14 14:17 | 09/Jan/14 14:24 | 3.4.3, 3.5.0 | 3.4.6, 3.5.0 | server, tests | 0 | 4 | The QuorumPeerMainTest.testQuorum, testBadPackets testcases are failing intermittently due to the wrong ZKClient usage pattern. Saw the following ConnectionLoss on 3.4 version: {noformat} KeeperErrorCode = ConnectionLoss for /foo_q1 org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /foo_q1 at org.apache.zookeeper.KeeperException.create(KeeperException.java:90) at org.apache.zookeeper.KeeperException.create(KeeperException.java:42) at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:657) at org.apache.zookeeper.server.quorum.QuorumPeerMainTest.testBadPackets(QuorumPeerMainTest.java:212) {noformat} Since the ZooKeeper connection is happening in async way through ClientCnxn, the client should wait for the 'KeeperState.SyncConnected' event before start using. But these test cases are not waiting for the connection like: {noformat} ZooKeeper zk = new ZooKeeper("127.0.0.1:" + CLIENT_PORT_QP1, ClientBase.CONNECTION_TIMEOUT, this); zk.create("/foo_q1", "foobar1".getBytes(), Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT); {noformat} |
test | 231558 | No Perforce job exists for this issue. | 1 | 41974 | 6 years, 2 weeks ago | 0|i07jsn: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1413 | Use on-disk transaction log for learner sync up |
Improvement | Resolved | Minor | Fixed | Thawan Kooburat | Thawan Kooburat | Thawan Kooburat | 13/Mar/12 17:42 | 14/Oct/13 11:55 | 01/Jul/13 13:22 | 3.4.3 | 3.5.0 | server | 0 | 11 | ZOOKEEPER-1777, ZOOKEEPER-876, ZOOKEEPER-1709, ZOOKEEPER-1710 | Motivation: The learner syncs up with leader by retrieving committed log from the leader. Currently, the leader only keeps 500 entries of recently committed log in memory. If the learner falls behind more than 500 updates, the leader will send the entire snapshot to the learner. With the size of the snapshot for some of our Zookeeper deployments (~10G), it is prohibitively expensive to send the entire snapshot over network. Additionally, our Zookeeper may serve more than 4K updates per seconds. As a result, a network hiccups for less than a second will cause the learner to use snapshot transfer. Design: Instead of looking only at committed log in memory, the leader will also look at transaction log on disk. The amount of transaction log kept on disk is configurable and the current default is 100k. This will allow Zookeeper to tolerate longer temporal network failure before initiating the snapshot transfer. Implementation: We plan to add interface to the persistence layer will can be use to retrieve proposals from on-disk transaction log. These proposals can then be used to send to the learner using existing protocol. |
performance, quorum | 231470 | No Perforce job exists for this issue. | 9 | 41975 | 6 years, 23 weeks, 3 days ago | 0|i07jsv: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1412 | java client watches inconsistently triggered on reconnect |
Bug | Resolved | Blocker | Fixed | Patrick D. Hunt | Botond Hejj | Botond Hejj | 12/Mar/12 05:13 | 04/Jun/12 19:33 | 15/Mar/12 13:06 | 3.3.3, 3.3.4, 3.4.0, 3.4.1, 3.4.2, 3.4.3 | 3.3.5, 3.4.4, 3.5.0 | server | 0 | 6 | ZOOKEEPER-1417 | I've observed an inconsistent behavior in java client watches. The inconsistency relates to the behavior after the client reconnects to the zookeeper ensemble. After the client reconnects to the ensemble only those watches should trigger which should have been triggered also if the connections was not lost. This means if I watch for changes in node /foo and there is no change there than my watch should not be triggered on reconnecting to the ensemble. This is not always the case in the java client. I've debugged the issues and I could locate the case when the watch is always triggered on reconnect. This is consistently happening if I connect to a follower in the ensemble and I don't do any operation which goes through the leader. Looking at the code I see that the client stores the lastzxid and sends that with its request. This is 0 on startup and will be updated everytime from the server replies. This lastzxid is also sent to the server after reconnect together with watches. The server decides which watch to trigger based on this lastzxid probably because that should mean the last known state of the client. If this lastzxid is 0 than all the watches are triggered. I've checked why is this lastzxid 0. I thought it shouldn't be since there was already a request to the server to set the watch and in the reply the server could have sent back the zxid but it turns out that it sends just 0. Looking at the server code I see that for requests which doesn't go through the leader the follower server just sends back the same zxid that the client sent. |
231227 | No Perforce job exists for this issue. | 6 | 12510 | 7 years, 42 weeks, 3 days ago |
Reviewed
|
0|i02hyf: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1411 | ZOOKEEPER-107 Consolidate membership management, distinguish between static and dynamic configuration parameters |
Sub-task | Resolved | Major | Fixed | Alexander Shraer | Alexander Shraer | Alexander Shraer | 08/Mar/12 20:02 | 01/May/13 22:30 | 02/Apr/13 02:33 | 3.5.0 | server | 0 | 7 | ZOOKEEPER-1660, ZOOKEEPER-1540, ZOOKEEPER-1625, ZOOKEEPER-107 | Currently every server has a different static configuration file. This patch distinguishes between dynamic parameters, which are now in a separate "dynamic configuration file", and static parameters which are in the usual file. The config file points to the dynamic config file by specifying "dynamicConfigFile=...". In the first stage (this patch), all cluster membership definitions are in the dynamic config file, but in the future additional parameters may be moved to the dynamic file. Backward compatibility makes sure that you can still use a single config file if you'd like. Only when the config is changed (once ZK-107 is in) a dynamic file is automatically created and the necessary parameters are moved to it. This patch also moves all membership parsing and management into the QuorumVerifier classes, and removes QuorumPeer.quorumPeers. The cluster membership is contained in QuorumPeer.quorumVerifier. QuorumVerifier was expanded and now has methods such as getAllMembers(), getVotingMembers(), getObservingMembers(). |
230933 | No Perforce job exists for this issue. | 12 | 33281 | 6 years, 51 weeks, 2 days ago |
Reviewed
|
0|i0625r: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1410 | ZOOKEEPER-1407 Support GetData and GetChildren in Multi for C client |
Sub-task | Open | Major | Unresolved | Unassigned | Ted Yu | Ted Yu | 08/Mar/12 19:52 | 10/May/14 04:58 | c client | 0 | 0 | This is task for C client portion of ZOOKEEPER-1407 | 230931 | No Perforce job exists for this issue. | 0 | 41976 | 8 years, 2 weeks, 6 days ago | 0|i07jt3: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1409 | CLI: deprecate ls2 command |
Improvement | Resolved | Minor | Duplicate | Hartmut Lang | Hartmut Lang | Hartmut Lang | 08/Mar/12 15:26 | 02/Apr/14 16:09 | 02/Apr/14 16:09 | 3.5.0 | java client | 0 | 2 | ZOOKEEPER-271, ZOOKEEPER-1408 | In the CLI mark ls2 command as deprecated. Instead add a -s option to the ls command. The options for ls would be: ls [-s] [-w] path -s stat -w watch |
230891 | No Perforce job exists for this issue. | 2 | 41977 | 5 years, 51 weeks, 1 day ago | 0|i07jtb: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1408 | CLI: sort output of ls command |
Improvement | Resolved | Minor | Fixed | Hartmut Lang | Hartmut Lang | Hartmut Lang | 08/Mar/12 15:16 | 02/Apr/14 16:09 | 28/Mar/14 23:34 | 3.5.0 | java client | 0 | 3 | ZOOKEEPER-271, ZOOKEEPER-1409 | Sort the output of the ls-command in the CLI. And remove the [] frame. Example: change output of "ls /" [test1, aa3, zkc1, aa2, aa1, zookeeper] to aa1, aa2, aa3, test1, zk1, zookeeper |
230889 | No Perforce job exists for this issue. | 4 | 41978 | 5 years, 51 weeks, 5 days ago | The output of ls-command in CLI does not contain the []-frame any more. Instead the nodes are sorted. |
Incompatible change
|
0|i07jtj: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1407 | Support GetData and GetChildren in Multi |
Improvement | Resolved | Major | Workaround | Ted Yu | Ted Yu | Ted Yu | 07/Mar/12 11:05 | 11/Sep/19 16:32 | 14/Jun/19 20:39 | java client, server | 4 | 7 | ZOOKEEPER-1410, ZOOKEEPER-3361 | There is use case where GetData and GetChildren would participate in Multi. We should add support for this case. |
1% | 16200 | 1193400 | 1209600 | 230709 | No Perforce job exists for this issue. | 5 | 41979 | 39 weeks, 5 days ago | ZOOKEEPER-3402 should resolve this | 0|i07jtr: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1406 | dpkg init scripts don't restart - missing check_priv_sep_dir |
Bug | Resolved | Major | Fixed | Chris Beauchamp | Chris Beauchamp | Chris Beauchamp | 06/Mar/12 04:14 | 18/Mar/12 07:00 | 18/Mar/12 02:50 | 3.4.3 | 3.4.4, 3.5.0 | scripts | 0 | 1 | Linux Ubuntu 10.4 lucid - presumably affects debian too, but not tested here | The included init.d script for dpkg creation doesn't restart. It exits with the following error: {quote} \# /etc/init.d/zookeeper restart /etc/init.d/zookeeper: 127: check_privsep_dir: not found {quote} Also the actual zkServer.sh line in restart has a path of .../bin/ rather than .../sbin/ |
230488 | No Perforce job exists for this issue. | 1 | 32569 | 8 years, 1 week, 4 days ago |
Reviewed
|
0|i05xrj: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1405 | leader election recipe sample code - dispatchEvent invocations can get out of order |
Bug | Open | Major | Unresolved | Unassigned | Robert Varga | Robert Varga | 05/Mar/12 08:42 | 06/Mar/12 07:10 | 3.4.3 | recipes | 0 | 3 | Since the process method is not synchronized in org.apache.zookeeper.recipes.election.LeaderElectionSupport, therefore there is a race condition where events coming in from the watch may overtake the events dispatched during the start method. A solution to ensure that events dispatched during the start method are handled before any watch-based events is to make the process method synchronized. |
230364 | No Perforce job exists for this issue. | 0 | 32570 | 8 years, 3 weeks, 2 days ago | 0|i05xrr: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1404 | leader election pseudo code probably incorrect |
Bug | Resolved | Major | Fixed | Unassigned | Robert Varga | Robert Varga | 05/Mar/12 07:05 | 14/Dec/12 17:11 | 14/Dec/12 17:11 | 3.4.3 | documentation | 0 | 4 | ZOOKEEPER-1483 | The pseudo code for leader election in the recipes.html page of 3.4.3 documentation is the following... {quote} Let ELECTION be a path of choice of the application. To volunteer to be a leader: 1.Create znode z with path "ELECTION/guid-n_" with both SEQUENCE and EPHEMERAL flags; 2.Let C be the children of "ELECTION", and i be the sequence number of z; 3.Watch for changes on "ELECTION/guid-n_j", where j is the {color:red}*smallest*{color} sequence number such that j < i and n_j is a znode in C; Upon receiving a notification of znode deletion: 1.Let C be the new set of children of ELECTION; 2.If z is the smallest node in C, then execute leader procedure; 3.Otherwise, watch for changes on "ELECTION/guid-n_j", where j is the {color:red}*smallest*{color} sequence number such that j < i and n_j is a znode in C; {quote} I think, in both third steps *highest* should appear instead of {color:red}*smallest*{color}. |
230354 | No Perforce job exists for this issue. | 0 | 32571 | 7 years, 14 weeks, 6 days ago | 0|i05xrz: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1403 | zkCli.sh script quoting issue |
Bug | Resolved | Minor | Fixed | James Page | James Page | James Page | 02/Mar/12 06:12 | 18/Mar/12 07:00 | 18/Mar/12 03:04 | 3.3.4, 3.4.3 | 3.3.6, 3.4.4, 3.5.0 | scripts | 0 | 1 | Ubuntu/Debian | The zkCli.sh script included with zookeeper doesn't quote its parameters correctly when passing them on to the java program. This causes issues with arguments with spaces and such. |
230102 | No Perforce job exists for this issue. | 1 | 32572 | 8 years, 1 week, 4 days ago |
Reviewed
|
0|i05xs7: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1402 | Upload Zookeeper package to Maven Central |
Improvement | Resolved | Minor | Done | Flavio Paiva Junqueira | Igor Lazebny | Igor Lazebny | 01/Mar/12 11:32 | 08/Oct/15 13:13 | 23/Sep/15 12:41 | 3.3.4 | 3.4.7 | 4 | 9 | It would be great to make Zookeeper package available in Maven Central as other Apache projects do (Camel, CXF, ActiveMQ, Karaf, etc). That would simplify usage of this package in maven builds. |
229990 | No Perforce job exists for this issue. | 0 | 41980 | 4 years, 24 weeks ago | this is just a jute plugin (perhaps we can open up the github it sits in). It made the maven pom file slightly cleaner and is a good template used by the next major patch that takes the previous mavenization patch and moves folders to match maven's expected structure. This takes out much of the custom config (but this patch is not yet 100% complete; still tweaking to handle all cases 100% for sure). putting both up for examination to see how people feel about moving to the maven structure (I show one way to use the modules also) |
0|i07jtz: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1401 | Extract generally useful client utilities from CLI code |
Improvement | Open | Major | Unresolved | Unassigned | Thomas Weise | Thomas Weise | 24/Feb/12 13:12 | 17/Mar/12 15:59 | java client | 0 | 0 | HIVE-2712 | There are a bunch of things that would be useful/reusable from ZK Java client, such as ACL parsing. Also, it would be nice to see other utilities for dealing with path creation ("mkdir -p ...") readily available for clients rather than implementing in downstream projects. Some of this can be seen in HIVE-2712. |
229263 | No Perforce job exists for this issue. | 0 | 41981 | 8 years, 1 week, 5 days ago | 0|i07ju7: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1400 | Allow logging via callback instead of raw FILE pointer |
Improvement | Resolved | Major | Fixed | Michi Mutsuzaki | Marshall McMullen | Marshall McMullen | 23/Feb/12 17:18 | 21/Aug/13 07:06 | 21/Aug/13 05:41 | 3.5.0 | 3.5.0 | c client | 0 | 5 | Linux | The existing logging framework inside the C client uses a raw FILE*. Using a FILE* is very limiting and potentially dangerous. A safer alternative is to just provide a callback that the C client will call for each message. In our environment, we saw some really nasty issues with multiple threads all connecting to zookeeper via the C Client related to the use of a raw FILE*. Specifically, if the FILE * is closed and that file descriptor is reused by the kernel before the C client is notified then the C client will use it's static global logStream pointer for subsequent logging messages. That FILE* is now a loose cannon! In our environment, we saw zookeeper log messages ending up in other sockets and even in our core data path. Clearly this is dangerous. In our particular case, we'd omitted a call to zoo_set_log_stream(NULL) to notify C client that the FILE* has been closed. However, even with that bug fixed, there's still a race condition where log messages in flight may be sent before the C client is notified of the FILE closure, and the same problem can happen. Other issues we've seen involved multiple threads, wherein one would close the FILE*, and that's a global change that affects all threads connected within that process. That's a pretty nasty limitation as well. My proposed change is to allow setting a callback for log messages. A callback is used in preference to a raw FILE*. If no callback is set, then it will fallback to the existing FILE*. If that's not set, then it falls back to stderr as it always has. While refactoring this code, I removed the need for the double parens in all the LOG macros as that wasn't necessary and didn't fit with my new approach. |
229151 | No Perforce job exists for this issue. | 8 | 2585 | 6 years, 31 weeks, 1 day ago | 0|i00spz: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1399 | Binary Jar in zookeeper-3.3.4 displays wrong version when run |
Bug | Open | Minor | Unresolved | Unassigned | Mike Lundy | Mike Lundy | 22/Feb/12 21:16 | 22/Feb/12 21:16 | 3.3.4 | build, server | 0 | 0 | When you start up zookeeper using the jar in zookeeper-3.3.4.tar.gz, it prints a 3.3.3 version string: server.ZooKeeperServer - Server environment:zookeeper.version=3.3.3-1203054, built on 11/17/2011 05:47 GMT server.ZooKeeperServer - Server environment:java.class.path=/usr/lib/zookeeper/apache-rat-tasks-0.6.jar:/usr/lib/zookeeper/commons-lang-2.4.jar:/usr/lib/zookeeper/commons-cli-1.1.jar:/usr/lib/zookeeper/log4j-1.2.15.jar:/usr/lib/zookeeper/commons-collections-3.2.jar:/usr/lib/zookeeper/apache-rat-core-0.6.jar:/usr/lib/zookeeper/jline-0.9.94.jar:/usr/lib/zookeeper/zookeeper-3.3.4.jar:/etc/zookeeper I assume this is due to a build problem of some form. (Rebuilding the jar from the tarball fixes the version). |
229029 | No Perforce job exists for this issue. | 0 | 32573 | 8 years, 5 weeks ago | 0|i05xsf: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1398 | zkpython corrupts session passwords that contain nulls |
Bug | Open | Critical | Unresolved | Mike Lundy | Mike Lundy | Mike Lundy | 22/Feb/12 14:10 | 25/Sep/14 08:56 | 3.3.4 | c client, contrib-bindings | 0 | 3 | If the session password contains a nul character (\0), it will be mutated as it is passed to python. zkpython currently uses the ParseArgs flag that stops on nul. | 228972 | No Perforce job exists for this issue. | 1 | 32574 | 5 years, 26 weeks ago | 0|i05xsn: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1397 | Remove BookKeeper documentation links |
Improvement | Resolved | Major | Fixed | Flavio Paiva Junqueira | Flavio Paiva Junqueira | Flavio Paiva Junqueira | 22/Feb/12 03:20 | 17/Mar/12 19:11 | 17/Mar/12 18:16 | 3.5.0 | 0 | 0 | BookKeeper is now a subproject and its documentation is maintained in the site of the subproject. Consequently, we should remove the links in the zookeeper documentation pages or at least point to the documentation of the subproject site. | 228878 | No Perforce job exists for this issue. | 1 | 33282 | 8 years, 1 week, 5 days ago | 0|i0625z: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1396 | Create zoo_append API |
Improvement | Open | Minor | Unresolved | Unassigned | Stephen Tyree | Stephen Tyree | 21/Feb/12 20:36 | 21/Feb/12 20:36 | c client, java client, server | 0 | 0 | I was trying to append data to a znode from the C library and I realized the workflow for that is pretty unfortunate. Essentially you need to do the following: - call zoo_exists to get the Stat structure which contains the data length of the znode - Allocate that many bytes plus how many you are adding to the znode dynamically in a buffer - call zoo_get to get the data for the znode - append the data you are append'ing to the znode in your local buffer - call zoo_set to set the data back into the znode If between the zoo_set and the zoo_get the data changes, sorry! You have to start from scratch. For a case where multiple consumers are trying to append data to a znode, this can become a nuisance. If there existed a zoo_append API, the workflow would become: - call zoo_append to append the data into the znode - If that fails, call zoo_set to create the znode with the data Assuming zoo_append wouldn't create the znode. This would mean fewer round trips against the server and simpler code. Even the Java library, which wouldn't need to worry about calling zoo_exists, would have one fewer round trip in the typical case. Is this a typical workflow for people? Would anyone else find this API valuable? |
228843 | No Perforce job exists for this issue. | 0 | 41982 | 8 years, 5 weeks, 1 day ago | 0|i07juf: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1395 | node-watcher double-free redux |
Bug | Resolved | Critical | Fixed | Mike Lundy | Mike Lundy | Mike Lundy | 21/Feb/12 15:01 | 25/Apr/12 19:37 | 30/Mar/12 18:31 | 3.3.4 | 3.3.6, 3.4.4, 3.5.0 | c client, contrib-bindings | 0 | 1 | This is basically the same issue as ZOOKEEPER-888 and ZOOKEEPER-740 (the latter is open as I write this, but it was superseded by the fix that went in with 888). The problem still exists after the ZOOKEEPER-888 patch, however; it's just more difficult to trigger: 1) Zookeeper notices connection loss, schedules watcher_dispatch 2) Zookeeper notices session loss, schedules watcher_dispatch 3) watcher_dispatch runs for connection loss 4) pywatcher is freed due to is_unrecoverable being true 5) watcher_dispatch runs for session loss 6) PyObject_CallObject attempts to run freed pywatcher with varying bad results The fix is easy, the dispatcher should act on the state it is given, not the state of the world when it runs. (Patch attached). Reliably triggering the crash is tricky due to the race, but it's not theoretical. |
228790 | No Perforce job exists for this issue. | 2 | 32575 | 7 years, 51 weeks, 5 days ago | 0|i05xsv: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1394 | ClassNotFoundException on shutdown of client |
Bug | Resolved | Minor | Not A Problem | wu wen | Herman Meerlo | Herman Meerlo | 21/Feb/12 08:26 | 17/Feb/17 08:44 | 30/Oct/16 22:22 | 3.4.2 | java client | 1 | 5 | ZOOKEEPER-2618 | ZOOKEEPER-2618, ZOOKEEPER-1816, ZOOKEEPER-2697 | OS X 10.7 java version "1.6.0_29" | When close() is called on the ZooKeeper instance from a ContextListener (contextDestroyed) there is no way to synchronize with the fact that the EventThread and SendThread have actually finished their work. The problem lies in the SendThread which makes a call to ZooTrace when it exits, but that class has not been loaded yet. Because the ContextListener could not synchronize with the death of the threads the classloader has already disappeared, resulting in a ClassNotFoundException. My personal opinion is that the close() method should probably wait until the event and send thread have actually died. |
228730 | No Perforce job exists for this issue. | 1 | 32576 | 3 years, 20 weeks, 3 days ago | 0|i05xt3: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1393 | ZooKeeper client exists() javadoc incorrectly states watcher(s) will be triggered on node deletion |
Bug | Resolved | Minor | Invalid | Unassigned | Gary Malouf | Gary Malouf | 15/Feb/12 14:11 | 25/Feb/12 13:41 | 25/Feb/12 13:41 | 3.3.4, 3.4.2 | java client | 0 | 1 | 1200 | 1200 | 0% | I found it very misleading that the javadoc for the exists() calls that take a boolean or a Watcher state that 'The watch will be triggered by a successful operation that creates/delete the node or sets the data on the node.' What I've seen from descriptions of bugs (older but this is this references it http://zookeeper-user.578899.n2.nabble.com/Exists-Watch-Triggered-by-Delete-td1490893.html) and my own personal usage is that watchers set on exists() are triggered when a non-existing node is now created or an existing node is changed. They are NOT triggered when the node already exists and is deleted. http://zookeeper.apache.org/doc/r3.4.3/api/index.html |
0% | 0% | 1200 | 1200 | 228019 | No Perforce job exists for this issue. | 0 | 32577 | 8 years, 4 weeks, 5 days ago | 0|i05xtb: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1392 | Should not allow to read ACL when not authorized to read node |
Bug | Closed | Major | Fixed | Bruce Gao | Thomas Weise | Thomas Weise | 12/Feb/12 20:45 | 02/Apr/19 06:40 | 06/Feb/19 09:40 | 3.4.2 | 3.6.0, 3.5.5, 3.4.14 | server | 0 | 6 | Not authorized to read, yet still able to list ACL: [zk: localhost:2181(CONNECTED) 0] getAcl /sasltest/n4 'sasl,'notme@EXAMPLE.COM : cdrwa [zk: localhost:2181(CONNECTED) 1] get /sasltest/n4 Exception in thread "main" org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode = NoAuth for /sasltest/n4 at org.apache.zookeeper.KeeperException.create(KeeperException.java:113) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1131) at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1160) at org.apache.zookeeper.ZooKeeperMain.processZKCmd(ZooKeeperMain.java:711) at org.apache.zookeeper.ZooKeeperMain.processCmd(ZooKeeperMain.java:593) at org.apache.zookeeper.ZooKeeperMain.executeLine(ZooKeeperMain.java:365) at org.apache.zookeeper.ZooKeeperMain.run(ZooKeeperMain.java:323) at org.apache.zookeeper.ZooKeeperMain.main(ZooKeeperMain.java:282) |
227630 | No Perforce job exists for this issue. | 1 | 32578 | 1 year, 6 weeks, 1 day ago | 0|i05xtj: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1391 | zkCli dies on NoAuth |
Bug | Resolved | Major | Duplicate | Hartmut Lang | Thomas Weise | Thomas Weise | 12/Feb/12 20:41 | 26/Apr/12 04:47 | 26/Apr/12 04:47 | 3.4.2 | 3.5.0 | java client | 0 | 1 | ZOOKEEPER-1307 | [zk: localhost:2181(CONNECTED) 1] create /sasltest/n4 c sasl:notme@EXAMPLE.COM:cdrwa Created /sasltest/n4 [zk: localhost:2181(CONNECTED) 2] ls /sasltest/n4 Exception in thread "main" org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode = NoAuth for /sasltest/n4 at org.apache.zookeeper.KeeperException.create(KeeperException.java:113) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1448) at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1476) at org.apache.zookeeper.ZooKeeperMain.processZKCmd(ZooKeeperMain.java:717) at org.apache.zookeeper.ZooKeeperMain.processCmd(ZooKeeperMain.java:593) at org.apache.zookeeper.ZooKeeperMain.executeLine(ZooKeeperMain.java:365) at org.apache.zookeeper.ZooKeeperMain.run(ZooKeeperMain.java:323) at org.apache.zookeeper.ZooKeeperMain.main(ZooKeeperMain.java:282) |
227629 | No Perforce job exists for this issue. | 2 | 32579 | 7 years, 48 weeks ago | 0|i05xtr: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1390 | some expensive debug code not protected by a check for debug |
Improvement | Resolved | Major | Fixed | Benjamin Reed | Benjamin Reed | Benjamin Reed | 10/Feb/12 00:12 | 17/Mar/12 12:07 | 17/Mar/12 12:07 | 3.4.4, 3.5.0 | server | 0 | 0 | there is some expensive debug code in DataTree.processTxn() that formats transactions for debugging that are very expensive but are only used when errors happen and when debugging is turned on. | 227362 | No Perforce job exists for this issue. | 1 | 33283 | 8 years, 1 week, 5 days ago | 0|i06267: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1389 | it would be nice if start-foreground used exec $JAVA in order to get rid of the intermediate shell process |
Improvement | Resolved | Major | Fixed | Roman Shaposhnik | Roman Shaposhnik | Roman Shaposhnik | 08/Feb/12 14:42 | 16/Feb/12 05:55 | 15/Feb/12 18:03 | 3.4.2 | 3.3.5, 3.4.4, 3.5.0 | scripts, server | 0 | 0 | A log of daemon management tools expect a process itself to be running as a child instead of a grand-child. It would be nice if we had an option for that in zkServer.sh | 227153 | No Perforce job exists for this issue. | 1 | 12515 | 8 years, 6 weeks ago |
Reviewed
|
0|i02hzj: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1388 | Client side 'PathValidation' is missing for the multi-transaction api. |
Bug | Closed | Major | Fixed | Rakesh Radhakrishnan | Rakesh Radhakrishnan | Rakesh Radhakrishnan | 07/Feb/12 00:17 | 13/Mar/14 14:17 | 17/Dec/13 11:57 | 3.4.0 | 3.4.6, 3.5.0 | java client | 0 | 9 | Multi ops: Op.create(path,..), Op.delete(path, ..), Op.setData(path, ..), Op.check(path, ...) apis are not performing the client side path validation and the call will go to the server side and is throwing exception back to the client. It would be good to provide ZooKeeper client side path validation for the multi transaction apis. Presently its getting err codes from the server, which is also not properly conveying the cause. For example: When specified invalid znode path in Op.create, it giving the following exception. This will not be useful to know the actual cause. {code} org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode at org.apache.zookeeper.KeeperException.create(KeeperException.java:115) at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:1174) at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:1115) {code} |
226850 | No Perforce job exists for this issue. | 6 | 32580 | 6 years, 2 weeks ago |
Reviewed
|
0|i05xtz: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1387 | Wrong epoch file created |
Bug | Closed | Minor | Fixed | Benjamin Reed | Benjamin Busjaeger | Benjamin Busjaeger | 06/Feb/12 00:57 | 13/Mar/14 14:16 | 13/Dec/12 03:00 | 3.4.2 | 3.4.6, 3.5.0 | quorum | 0 | 4 | It looks like line 443 in QuorumPeer [1] may need to change from: writeLongToFile(CURRENT_EPOCH_FILENAME, acceptedEpoch); to writeLongToFile(ACCEPTED_EPOCH_FILENAME, acceptedEpoch); I only noticed this reading the code, so I may be wrong and I don't know yet if/how this affects the runtime. [1] https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/server/quorum/QuorumPeer.java#L443 |
226662 | No Perforce job exists for this issue. | 2 | 2376 | 6 years, 2 weeks ago |
Reviewed
|
0|i00rfj: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1386 | avoid flaky URL redirection in "ant javadoc" : replace "http://java.sun.com/javase/6/docs/api/" with "http://download.oracle.com/javase/6/docs/api/" |
Bug | Resolved | Minor | Fixed | Eugene Joseph Koontz | Eugene Joseph Koontz | Eugene Joseph Koontz | 03/Feb/12 14:14 | 27/Feb/12 19:23 | 26/Feb/12 20:01 | 3.3.5, 3.4.4, 3.5.0 | documentation | 0 | 1 | HADOOP-8019 | It seems that the current javadoc.link.java value, http://java.sun.com/javase/6/docs/api/, redirects (via HTTP 301) to http://download.oracle.com/javase/6/docs/api/. This redirect does not always work apparently, causing the URL fetch to fail. This causes an additional javadoc warning: javadoc: warning - Error fetching URL: http://java.sun.com/javase/6/docs/api/package-list which can in turn cause Jenkins to give a -1 to an otherwise OK build (see e.g. https://issues.apache.org/jira/browse/ZOOKEEPER-1373?focusedCommentId=13199456&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13199456). |
226473 | No Perforce job exists for this issue. | 1 | 12514 | 8 years, 4 weeks, 3 days ago | 0|i02hzb: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1385 | zookeeper.apache.org/doc/trunk/ has broken pointers |
Bug | Open | Major | Unresolved | Unassigned | Flavio Paiva Junqueira | Flavio Paiva Junqueira | 01/Feb/12 12:27 | 09/Oct/13 02:44 | documentation | 0 | 0 | API Docs gives a "Not found" message. | 226136 | No Perforce job exists for this issue. | 0 | 32581 | 8 years, 8 weeks, 1 day ago | 0|i05xu7: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1384 | test-cppunit overrides LD_LIBRARY_PATH and fails if gcc is in non-standard location |
Bug | Resolved | Minor | Fixed | Jay Shrauner | Jay Shrauner | Jay Shrauner | 31/Jan/12 19:51 | 19/Mar/12 07:00 | 19/Mar/12 02:17 | 3.4.2 | 3.4.4, 3.5.0 | build, tests | 0 | 1 | Linux | On Linux with gcc installed in /usr/local and the libs in /usr/local/lib64, test-core-cppunit fails because zktest-st is unable to find the right libstdc++. build.xml is overriding the environment LD_LIBRARY_PATH instead of appending to it. This should be changed to match the treatment of PATH by appending the desired extra path. |
226057 | No Perforce job exists for this issue. | 1 | 32582 | 8 years, 1 week, 3 days ago |
Reviewed
|
0|i05xuf: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1383 | Create update throughput quotas and add hard quota limits |
New Feature | Open | Major | Unresolved | Thawan Kooburat | Jay Shrauner | Jay Shrauner | 31/Jan/12 19:09 | 16/Jun/19 22:36 | server | 0 | 3 | ZOOKEEPER-3301 | Quotas exist for size (node count and size in bytes); it would be useful to track and support quotas on update throughput (bytes per second) as well. This can be tracked on both a node/subtree level for quota support as well as on the server level for monitoring. In addition, the existing quotas log a warning when they are exceeded but allow the transaction to proceed (soft quotas). It would also be useful to support a corresponding set of hard quota limits that fail the transaction. |
226050 | No Perforce job exists for this issue. | 4 | 2588 | 7 years, 5 weeks, 1 day ago | Adds support for throughput quotas (soft and hard) and hard node count and hard size quotas. Parses quota nodes from older versions of the server and preserves behavior of existing quotas (soft node count and soft size). | quotas | 0|i00sqn: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1382 | Zookeeper server holds onto dead/expired session ids in the watch data structures |
Bug | Closed | Critical | Fixed | Germán Blanco | Neha Narkhede | Neha Narkhede | 30/Jan/12 20:06 | 14/Oct/16 01:47 | 11/Dec/13 14:18 | 3.4.5 | 3.4.6, 3.5.0 | server | 2 | 18 | I've observed that zookeeper server holds onto expired session ids in the watcher data structures. The result is the wchp command reports session ids that cannot be found through cons/dump and those expired session ids sit there maybe until the server is restarted. Here are snippets from the client and the server logs that lead to this state, for one particular session id 0x134485fd7bcb26f - There are 4 servers in the zookeeper cluster - 223, 224, 225 (leader), 226 and I'm using ZkClient to connect to the cluster From the application log - application.log.2012-01-26-325.gz:2012/01/26 04:56:36.177 INFO [ClientCnxn] [main-SendThread(223.prod:12913)] [application Session establishment complete on server 223.prod/172.17.135.38:12913, sessionid = 0x134485fd7bcb26f, negotiated timeout = 6000 application.log.2012-01-27.gz:2012/01/27 09:52:37.714 INFO [ClientCnxn] [main-SendThread(223.prod:12913)] [application] Client session timed out, have not heard from server in 9827ms for sessionid 0x134485fd7bcb26f, closing socket connection and attempting reconnect application.log.2012-01-27.gz:2012/01/27 09:52:38.191 INFO [ClientCnxn] [main-SendThread(226.prod:12913)] [application] Unable to reconnect to ZooKeeper service, session 0x134485fd7bcb26f has expired, closing socket connection On the leader zk, 225 - zookeeper.log.2012-01-27-leader-225.gz:2012-01-27 09:52:34,010 - INFO [SessionTracker:ZooKeeperServer@314] - Expiring session 0x134485fd7bcb26f, timeout of 6000ms exceeded zookeeper.log.2012-01-27-leader-225.gz:2012-01-27 09:52:34,010 - INFO [ProcessThread:-1:PrepRequestProcessor@391] - Processed session termination for sessionid: 0x134485fd7bcb26f On the server, the client was initially connected to, 223 - zookeeper.log.2012-01-26-223.gz:2012-01-26 04:56:36,173 - INFO [CommitProcessor:1:NIOServerCnxn@1580] - Established session 0x134485fd7bcb26f with negotiated timeout 6000 for client /172.17.136.82:45020 zookeeper.log.2012-01-27-223.gz:2012-01-27 09:52:34,018 - INFO [CommitProcessor:1:NIOServerCnxn@1435] - Closed socket connection for client /172.17.136.82:45020 which had sessionid 0x134485fd7bcb26f Here are the log snippets from 226, which is the server, the client reconnected to, before getting session expired event - 2012-01-27 09:52:38,190 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:12913:NIOServerCnxn@770] - Client attempting to renew session 0x134485fd7bcb26f at /172.17.136.82:49367 2012-01-27 09:52:38,191 - INFO [QuorumPeer:/0.0.0.0:12913:NIOServerCnxn@1573] - Invalid session 0x134485fd7bcb26f for client /172.17.136.82:49367, probably expired 2012-01-27 09:52:38,191 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:12913:NIOServerCnxn@1435] - Closed socket connection for client /172.17.136.82:49367 which had sessionid 0x134485fd7bcb26f wchp output from 226, taken on 01/30 - nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f *226.*wchp* | wc -l 3 wchp output from 223, taken on 01/30 - nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f *223.*wchp* | wc -l 0 cons output from 223 and 226, taken on 01/30 - nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f *226.*cons* | wc -l 0 nnarkhed-ld:zk-cons-wchp-2012013000 nnarkhed$ grep 0x134485fd7bcb26f *223.*cons* | wc -l 0 So, what seems to have happened is that the client was able to re-register the watches on the new server (226), after it got disconnected from 223, inspite of having an expired session id. In NIOServerCnxn, I saw that after suspecting that a session is expired, a server removes the cnxn and its watches from its internal data structures. But before that it allows more requests to be processed even if the session is expired - // Now that the session is ready we can start receiving packets synchronized (this.factory) { sk.selector().wakeup(); enableRecv(); } } catch (Exception e) { LOG.warn("Exception while establishing session, closing", e); close(); } I wonder if the client somehow sneaked in the set watches, right after the server removed the connection through removeCnxn() API ? |
225890 | No Perforce job exists for this issue. | 10 | 32583 | 3 years, 22 weeks, 6 days ago | 0|i05xun: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1381 | Add a method to get the zookeeper server version from the client |
Improvement | Open | Minor | Unresolved | Unassigned | Nicolas Liochon | Nicolas Liochon | 30/Jan/12 16:31 | 28/Jun/12 04:10 | 3.4.2 | c client, documentation, java client, server | 0 | 3 | ZOOKEEPER-1455, ZOOKEEPER-1495, HBASE-6058 | all | Zookeeper client API is designed to be server version agnostic as much as possible, so we can have new clients with old servers (or the opposite). But there is today no simple way for a client to know what's the server version. This would be very useful in order to; - check the compatibility (ex: 'multi' implementation available since 3.4 while 3.4 clients API supports 3.3 servers as well) - have different implementation depending on the server functionalities A workaround (proposed by Mahadev Konar) is do "echo stat | nc hostname clientport" and parse the output to get the version. The output is, for example: ----------------------- Zookeeper version: 3.4.2--1, built on 01/30/2012 17:43 GMT Clients: /127.0.0.1:54951[0](queued=0,recved=1,sent=0) Latency min/avg/max: 0/0/0 Received: 1 Sent: 0 Outstanding: 0 Zxid: 0x500000001 Mode: follower Node count: 7 -------------------- |
newbie | 225852 | No Perforce job exists for this issue. | 1 | 41983 | 7 years, 39 weeks ago | 0|i07jun: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1380 | zkperl: _zk_release_watch doesn't remove items properly from the watch list |
Bug | Resolved | Major | Fixed | Botond Hejj | Botond Hejj | Botond Hejj | 30/Jan/12 12:14 | 07/Sep/12 07:01 | 07/Sep/12 02:15 | 3.3.3, 3.3.4, 3.4.0, 3.4.1, 3.4.2 | 3.4.4, 3.5.0 | contrib-bindings | 0 | 3 | The doubly linked list of watches is not updated properly if a watch is taken out from the middle of the chain. The item after the item which is taken out will receive null pointer for the previous element! This will make the doubly linked list inconsistent and can lead to segfault or infinite loop when the doubly linked list is iterated later. |
225810 | No Perforce job exists for this issue. | 1 | 32584 | 7 years, 28 weeks, 6 days ago |
Reviewed
|
zookeeper perl zkperl Net-ZooKeeper | 0|i05xuv: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1379 | 'printwatches, redo, history and connect '. client commands always print usage. This is not necessary |
Bug | Closed | Minor | Fixed | Edward Ribeiro | kavita sharma | kavita sharma | 30/Jan/12 03:43 | 13/Mar/14 14:17 | 02/Sep/13 17:05 | 3.4.0 | 3.4.6, 3.5.0 | java client | 0 | 4 | while executing the commands: 'printwatches, redo, history and connect usage is getting print .basically we are printing usage if user has entered the command wrong but in these commands case every time usage is getting print. eg {noformat} [zk: localhost:2181(CONNECTED) 0] printwatches printwatches is on ZooKeeper -server host:port cmd args connect host:port get path [watch] ls path [watch] set path data [version] delquota [-n|-b] path quit printwatches on|off create [-s] [-e] path data acl stat path [watch] close ls2 path [watch] history listquota path setAcl path acl getAcl path sync path redo cmdno addauth scheme auth delete path [version] setquota -n|-b val path {noformat} |
225740 | No Perforce job exists for this issue. | 5 | 41984 | 6 years, 2 weeks ago | 0|i07juv: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1378 | Provide option to turn off sending of diffs |
Task | Open | Major | Unresolved | Unassigned | Ted Yu | Ted Yu | 29/Jan/12 17:51 | 05/Feb/20 07:16 | 3.7.0, 3.5.8 | 0 | 4 | From Patrick: we need to have an option to turn off sending of diffs. There are a couple of really strong reasons I can think of to do this: 1) 3.3.x is broken in a similar way, there is an upgrade problem we can't solve short of having ppl first upgrade to a fixed 3.3 (3.3.5 say) and then upgrading to 3.4.x. If we could turn off diff sending this would address the problem. 2) safety valve. Say we find another new problem with diff sending in 3.4/3/5. Having an option to turn it off would be useful for people as a workaround until a fix is found and released. |
225720 | No Perforce job exists for this issue. | 0 | 41985 | 4 years, 2 days ago | 0|i07jv3: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1377 | add support for dumping a snapshot file content (similar to LogFormatter) |
Improvement | Resolved | Major | Fixed | Patrick D. Hunt | Patrick D. Hunt | Patrick D. Hunt | 27/Jan/12 13:45 | 18/Mar/12 18:26 | 18/Mar/12 18:26 | 3.4.4, 3.5.0 | server | 0 | 1 | We have LogFormatter but not SnapshotFormatter. I've added this, patch momentarily. | newbie | 225592 | No Perforce job exists for this issue. | 2 | 33284 | 8 years, 1 week, 4 days ago | 0|i0626f: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1376 | zkServer.sh does not correctly check for $SERVER_JVMFLAGS |
Bug | Resolved | Minor | Fixed | Skye Wanderman-Milne | Patrick D. Hunt | Patrick D. Hunt | 26/Jan/12 20:39 | 24/Sep/12 14:29 | 21/Sep/12 19:05 | 3.3.3, 3.3.4 | 3.3.7, 3.4.5 | scripts | 0 | 3 | ZOOKEEPER-1012 | It will always include it even if not defined, although not much harm. if [ "x$SERVER_JVMFLAGS" ] then JVMFLAGS="$SERVER_JVMFLAGS $JVMFLAGS" fi should use the std idiom. |
newbie | 225490 | No Perforce job exists for this issue. | 1 | 32585 | 7 years, 26 weeks, 6 days ago |
Reviewed
|
0|i05xv3: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1375 | SendThread is exiting after OOMError |
Bug | Open | Major | Unresolved | Unassigned | Rakesh Radhakrishnan | Rakesh Radhakrishnan | 25/Jan/12 03:43 | 12/Sep/13 18:47 | 3.4.0 | 0 | 5 | After reviewing the ClientCnxn code, there is still chances of exiting the SendThread without intimating the users. Say if client throws OOMError and entered into the throwable block. Here again while sending the Disconnected event, its creating "new WatchedEvent()" object.This will throw OOMError and leads to exit the SendThread without any Disconnected event notification. {noformat} try{ //... } catch (Throwable e) { //.. cleanup(); if(state.isAlive()){ eventThread.queueEvent( new WatchedEvent(Event.EventType.None, Event.KeeperState.Disconnected, null) ) } //.... } {noformat} |
225232 | No Perforce job exists for this issue. | 0 | 32586 | 6 years, 28 weeks ago | 0|i05xvb: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1374 | C client multi-threaded test suite fails to compile on ARM architectures. |
Bug | Resolved | Minor | Fixed | James Page | James Page | James Page | 24/Jan/12 10:22 | 28/Jun/16 04:37 | 06/Feb/12 04:54 | 3.3.4 | 3.4.3, 3.5.0 | c client | 0 | 3 | ZOOKEEPER-2453 | Ubuntu 12.04 (precise) armel or armhf | The multi-threaded test suite fails to build on ARM architectures: g++ -DHAVE_CONFIG_H -I. -I./include -I./tests -I./generated -D_FORTIFY_SOURCE=2 -DUSE_STATIC_LIB -DTHREADED -DZKSERVER_CMD="\"./tests/zkServer.sh\"" -Wall -g -MT zktest_mt-ThreadingUtil.o -MD -MP -MF .deps/zktest_mt-ThreadingUtil.Tpo -c -o zktest_mt-ThreadingUtil.o `test -f 'tests/ThreadingUtil.cc' || echo './'`tests/ThreadingUtil.cc /tmp/ccqJWQRC.s: Assembler messages: /tmp/ccqJWQRC.s:373: Error: bad instruction `lock xaddl r4,[r3,#0]' /tmp/ccqJWQRC.s:425: Error: bad instruction `lock xchgl r4,[r3,#0]' gcc does provide alternative primitives (_sync_*) which provide better cross platform compatibility; but that does make the assumption that a) gcc is being used or b) the primitives are provided by alternative compilers. Tracked in Ubuntu here: https://bugs.launchpad.net/ubuntu/+source/zookeeper/+bug/920871 |
225124 | No Perforce job exists for this issue. | 2 | 32587 | 8 years, 7 weeks, 3 days ago |
Reviewed
|
0|i05xvj: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1373 | Hardcoded SASL login context name clashes with Hadoop security configuration override |
Bug | Resolved | Major | Fixed | Eugene Joseph Koontz | Thomas Weise | Thomas Weise | 23/Jan/12 22:40 | 01/May/13 22:29 | 06/Feb/12 03:37 | 3.4.2 | 3.4.3, 3.5.0 | java client | 0 | 4 | HADOOP-7853, ZOOKEEPER-938, ZOOKEEPER-1497, HBASE-4791, HIVE-2712, ZOOKEEPER-1467 | I'm trying to configure a process with Hadoop security (Hive metastore server) to talk to ZooKeeper 3.4.2 with Kerberos authentication. In this scenario Hadoop controls the SASL configuration (org.apache.hadoop.security.UserGroupInformation.HadoopConfiguration), instead of setting up the ZooKeeper "Client" loginContext via jaas.conf and system property {{-Djava.security.auth.login.config}} Using the Hadoop configuration would work, except that ZooKeeper client code expects the loginContextName to be "Client" while Hadoop security will use "hadoop-keytab-kerberos". I verified that by changing the name in the debugger the SASL authentication succeeds while otherwise the login configuration cannot be resolved and the connection to ZooKeeper is unauthenticated. To integrate with Hadoop, the following in ZooKeeperSaslClient would need to change to make the name configurable: {{login = new Login("Client",new ClientCallbackHandler(null));}} |
225065 | No Perforce job exists for this issue. | 7 | 32588 | 8 years, 7 weeks, 2 days ago | 0|i05xvr: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1372 | stat reports inconsistent zxids across servers after a leader change |
Bug | Open | Major | Unresolved | Unassigned | Patrick D. Hunt | Patrick D. Hunt | 23/Jan/12 20:33 | 23/Jan/12 20:33 | 3.4.2 | quorum | 0 | 1 | I started a 2 server ensemble, made some changes to znodes, then shutdown the cluster. I then removed the datadir from the original leader. I then restarted the entire ensemble. after this the new leader has a zxid of 0x400000000 while the follower reported a zxid of 0x300000007 (the last zxid of the old epoch). This was via stat. I then connected a client to the ensemble, subsequent to which the zxid was again in sync. The data all seemed fine, but stat was reporting invalid information until a client connected. |
225059 | No Perforce job exists for this issue. | 0 | 32589 | 8 years, 9 weeks, 2 days ago | 0|i05xvz: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1371 | Remove dependency on log4j in the source code. |
Bug | Closed | Major | Fixed | Mohammad Arshad | Mahadev Konar | Mahadev Konar | 23/Jan/12 19:30 | 24/Feb/20 20:21 | 21/Nov/15 16:21 | 3.4.0, 3.4.1, 3.4.2, 3.4.3 | 3.5.2, 3.6.0 | 6 | 24 | ZOOKEEPER-2342, ZOOKEEPER-3737, ZOOKEEPER-2393, ZOOKEEPER-850 | ZOOKEEPER-850 added slf4j to ZK. We still depend on log4j in our codebase. We should remove the dependency on log4j so that we can make logging pluggable. |
patch | 225049 | No Perforce job exists for this issue. | 5 | 32590 | 2 years, 50 weeks, 2 days ago | 0|i05xw7: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1370 | Add logging changes in Release Notes needed for clients because of ZOOKEEPER-850. |
Bug | Resolved | Major | Fixed | Mahadev Konar | Mahadev Konar | Mahadev Konar | 23/Jan/12 19:28 | 06/Feb/12 05:29 | 06/Feb/12 05:29 | 3.4.3 | 0 | 1 | 225048 | No Perforce job exists for this issue. | 0 | 32591 | 8 years, 7 weeks, 3 days ago | 0|i05xwf: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1369 | Mock access to time-related methods |
Improvement | Open | Major | Unresolved | Unassigned | Henry Robinson | Henry Robinson | 23/Jan/12 15:47 | 23/Jan/12 15:47 | 0 | 0 | As we began to discuss in ZOOKEEPER-1366, it would be great to have the ability to mock out time methods anywhere to help with deterministic, more efficient testing. The general idea is to have a 'mock clock' that any thread can interact with as though it were the real clock. Time would typically be advanced by an independent thread of control (normally the thread that the test is running in). There are two main method calls that interact with the JVM clock: # {{System.currentTimeMillis}} - very easy to mock # {{Thread.sleep}} - slightly harder, since the mock clock would need to keep an ordered list of threads that need to be woken up and release a barrier for each one as time was advanced. Other implicit methods, such as setting the socket rx timeout, are probably too hard to mock and are out of scope for this ticket. |
225028 | No Perforce job exists for this issue. | 0 | 41986 | 8 years, 9 weeks, 3 days ago | 0|i07jvb: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1368 | zookeeper c client get apis crash if zhandle is null |
Bug | Open | Major | Unresolved | Unassigned | Marc Celani | Marc Celani | 21/Jan/12 23:06 | 23/Jan/12 15:11 | c client | 0 | 2 | 604800 | 604800 | 0% | Although wget, awget, wexists, awexists, wgetchildren, awgetchildren will return ZBADARGUMENTS when zh is null, the get APIs will crash if you request a watch, as they dereference the zh without checking for null in order to get the watch function. | 0% | 0% | 604800 | 604800 | newbie | 224832 | No Perforce job exists for this issue. | 0 | 32592 | 8 years, 9 weeks, 4 days ago | 0|i05xwn: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1367 | Data inconsistencies and unexpired ephemeral nodes after cluster restart |
Bug | Resolved | Blocker | Fixed | Benjamin Reed | Jeremy Stribling | Jeremy Stribling | 20/Jan/12 13:48 | 28/Aug/13 18:20 | 31/Jan/12 01:56 | 3.4.2 | 3.4.3, 3.3.5, 3.5.0 | server | 0 | 9 | Debian Squeeze, 64-bit | In one of our tests, we have a cluster of three ZooKeeper servers. We kill all three, and then restart just two of them. Sometimes we notice that on one of the restarted servers, ephemeral nodes from previous sessions do not get deleted, while on the other server they do. We are effectively running 3.4.2, though technically we are running 3.4.1 with the patch manually applied for ZOOKEEPER-1333 and a C client for 3.4.1 with the patches for ZOOKEEPER-1163. I noticed that when I connected using zkCli.sh to the first node (90.0.0.221, zkid 84), I saw only one znode in a particular path: {quote} [zk: 90.0.0.221:2888(CONNECTED) 0] ls /election/zkrsm [nominee0000000011] [zk: 90.0.0.221:2888(CONNECTED) 1] get /election/zkrsm/nominee0000000011 90.0.0.222:7777 cZxid = 0x400000027 ctime = Thu Jan 19 08:18:24 UTC 2012 mZxid = 0x400000027 mtime = Thu Jan 19 08:18:24 UTC 2012 pZxid = 0x400000027 cversion = 0 dataVersion = 0 aclVersion = 0 ephemeralOwner = 0xa234f4f3bc220001 dataLength = 16 numChildren = 0 {quote} However, when I connect zkCli.sh to the second server (90.0.0.222, zkid 251), I saw three znodes under that same path: {quote} [zk: 90.0.0.222:2888(CONNECTED) 2] ls /election/zkrsm nominee0000000006 nominee0000000010 nominee0000000011 [zk: 90.0.0.222:2888(CONNECTED) 2] get /election/zkrsm/nominee0000000011 90.0.0.222:7777 cZxid = 0x400000027 ctime = Thu Jan 19 08:18:24 UTC 2012 mZxid = 0x400000027 mtime = Thu Jan 19 08:18:24 UTC 2012 pZxid = 0x400000027 cversion = 0 dataVersion = 0 aclVersion = 0 ephemeralOwner = 0xa234f4f3bc220001 dataLength = 16 numChildren = 0 [zk: 90.0.0.222:2888(CONNECTED) 3] get /election/zkrsm/nominee0000000010 90.0.0.221:7777 cZxid = 0x30000014c ctime = Thu Jan 19 07:53:42 UTC 2012 mZxid = 0x30000014c mtime = Thu Jan 19 07:53:42 UTC 2012 pZxid = 0x30000014c cversion = 0 dataVersion = 0 aclVersion = 0 ephemeralOwner = 0xa234f4f3bc220000 dataLength = 16 numChildren = 0 [zk: 90.0.0.222:2888(CONNECTED) 4] get /election/zkrsm/nominee0000000006 90.0.0.223:7777 cZxid = 0x200000cab ctime = Thu Jan 19 08:00:30 UTC 2012 mZxid = 0x200000cab mtime = Thu Jan 19 08:00:30 UTC 2012 pZxid = 0x200000cab cversion = 0 dataVersion = 0 aclVersion = 0 ephemeralOwner = 0x5434f5074e040002 dataLength = 16 numChildren = 0 {quote} These never went away for the lifetime of the server, for any clients connected directly to that server. Note that this cluster is configured to have all three servers still, the third one being down (90.0.0.223, zkid 162). I captured the data/snapshot directories for the the two live servers. When I start single-node servers using each directory, I can briefly see that the inconsistent data is present in those logs, though the ephemeral nodes seem to get (correctly) cleaned up pretty soon after I start the server. I will upload a tar containing the debug logs and data directories from the failure. I think we can reproduce it regularly if you need more info. |
224696 | No Perforce job exists for this issue. | 6 | 32593 | 6 years, 30 weeks, 1 day ago | Fix Data inconsistencies and unexpired ephemeral nodes after cluster restart. |
Reviewed
|
0|i05xwv: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1366 | Zookeeper should be tolerant of clock adjustments |
Bug | Resolved | Critical | Fixed | Hongchao Deng | Ted Dunning | Ted Dunning | 19/Jan/12 02:00 | 11/May/17 16:34 | 06/Feb/15 00:19 | 3.5.1, 3.6.0 | 7 | 32 | ZOOKEEPER-1626 | ZOOKEEPER-1626, ZOOKEEPER-366, ZOOKEEPER-1616, ZOOKEEPER-2774 | If you want to wreak havoc on a ZK based system just do [date -s "+1hour"] and watch the mayhem as all sessions expire at once. This shouldn't happen. Zookeeper could easily know handle elapsed times as elapsed times rather than as differences between absolute times. The absolute times are subject to adjustment when the clock is set while a timer is not subject to this problem. In Java, System.currentTimeMillis() gives you absolute time while System.nanoTime() gives you time based on a timer from an arbitrary epoch. I have done this and have been running tests now for some tens of minutes with no failures. I will set up a test machine to redo the build again on Ubuntu and post a patch here for discussion. |
224432 | No Perforce job exists for this issue. | 18 | 32594 | 5 years, 6 weeks, 6 days ago | 0|i05xx3: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1365 | Removing a duplicate function and another minor cleanup in QuorumPeer.java |
Improvement | Resolved | Trivial | Not A Problem | Alexander Shraer | Alexander Shraer | Alexander Shraer | 18/Jan/12 20:11 | 19/Jan/12 20:12 | 19/Jan/12 20:12 | server | 0 | 0 | - getMyId() and getId() in QuorumPeer are doing the same thing - QuorumPeer.quorumPeers is being read directly from outside QuorumPeer, although we have the getter QuorumPeers.getView(). The purpose of this cleanup is to later be able to change more easily the way QuorumPeer manages its list of peers (to support dynamic changes in this list). |
224410 | No Perforce job exists for this issue. | 2 | 33285 | 8 years, 9 weeks, 6 days ago | 0|i0626n: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1364 | Add orthogonal fault injection mechanism/framework |
Test | Open | Major | Unresolved | Andrei Savu | Andrei Savu | Andrei Savu | 17/Jan/12 10:56 | 14/Jan/20 05:56 | tests | 0 | 3 | 0 | 3600 | ZOOKEEPER-2549, ZOOKEEPER-3601, HDFS-435 | Hadoop has a mechanism for doing fault injection (HDFS-435). I think it would be useful if something similar would be available for ZooKeeper. | 100% | 100% | 3600 | 0 | pull-request-available | 224149 | No Perforce job exists for this issue. | 0 | 41987 | 3 years, 12 weeks, 2 days ago | 0|i07jvj: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1363 | Categorise unit tests by 'test-commit', 'full-test' etc |
Improvement | Resolved | Major | Won't Fix | Mark Fenes | Henry Robinson | Henry Robinson | 17/Jan/12 01:33 | 23/Mar/18 10:43 | 23/Mar/18 10:43 | build, tests | 0 | 2 | As discussed on the list, it would be good to split the Java test suite into categories so that it's easy to run a small set of unit tests against a patch, and to leave Jenkins to run the full suite of stress tests etc. | newbie | 224104 | No Perforce job exists for this issue. | 0 | 41988 | 1 year, 51 weeks, 6 days ago | 0|i07jvr: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1362 | ZooDefs.Ids ACL lists not immutable |
Improvement | Open | Trivial | Unresolved | Unassigned | Tassos Souris | Tassos Souris | 16/Jan/12 14:51 | 23/Nov/16 11:52 | java client | 0 | 3 | In org.apache.zookeeper: 1) ZooDefs.Ids.OPEN_ACL_UNSAFE 2) ZooDefs.Ids.CREATOR_ALL_ACL 3) ZooDefs.Ids.READ_ALL_ACL are not immutable lists. Unlikely but the client could alter them. |
224068 | No Perforce job exists for this issue. | 0 | 41989 | 3 years, 17 weeks, 1 day ago | 0|i07jvz: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1361 | Leader.lead iterates over 'learners' set without proper synchronisation |
Bug | Resolved | Major | Fixed | Henry Robinson | Henry Robinson | Henry Robinson | 13/Jan/12 12:43 | 17/Sep/12 01:04 | 17/Sep/12 01:04 | 3.4.2 | 3.4.4, 3.5.0 | 0 | 5 | This block: {code} HashSet<Long> followerSet = new HashSet<Long>(); for(LearnerHandler f : learners) followerSet.add(f.getSid()); {code} is executed without holding the lock on learners, so if there were ever a condition where a new learner was added during the initial sync phase, I'm pretty sure we'd see a concurrent modification exception. Certainly other parts of the code are very careful to lock on learners when iterating. It would be nice to use a {{ConcurrentHashMap}} to hold the learners instead, but I can't convince myself that this wouldn't introduce some correctness bugs. For example the following: Learners contains A, B, C, D Thread 1 iterates over learners, and gets as far as B. Thread 2 removes A, and adds E. Thread 1 continues iterating and sees a learner view of A, B, C, D, E This may be a bug if Thread 1 is counting the number of synced followers for a quorum count, since at no point was A, B, C, D, E a correct view of the quorum. In practice, I think this is actually ok, because I don't think ZK makes any strong ordering guarantees on learners joining or leaving (so we don't need a strong serialisability guarantee on learners) but I don't think I'll make that change for this patch. Instead I want to clean up the locking protocols on the follower / learner sets - to avoid another easy deadlock like the one we saw in ZOOKEEPER-1294 - and to do less with the lock held; i.e. to copy and then iterate over the copy rather than iterate over a locked set. |
223846 | No Perforce job exists for this issue. | 5 | 32595 | 7 years, 27 weeks, 3 days ago | 0|i05xxb: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1360 | QuorumTest.testNoLogBeforeLeaderEstablishment has several problems |
Bug | Open | Major | Unresolved | Abraham Fine | Henry Robinson | Henry Robinson | 12/Jan/12 02:44 | 05/Feb/20 07:16 | 3.4.2 | 3.7.0, 3.5.8 | tests | 0 | 4 | After the apparently valid fix to ZOOKEEPER-1294, testNoLogBeforeLeaderEstablishment is failing for me about one time in four. While I'll investigate whether the patch is 1294 is ultimately to blame, reading the test brought to light a number of issues that appear to be bugs or in need of improvement: * As part of QuorumTest, an ensemble is already established by the fixture setup code, but apparently unused by the test which uses QuorumUtil. * The test reads QuorumPeer.leader and QuorumPeer.follower without synchronization, which means that writes to those fields may not be published when we come to read them. * The return value of sem.tryAcquire is never checked. * The progress of the test is based on ad-hoc timings (25 * 500ms sleeps) and inscrutable numbers of iterations through the main loop (e.g. the semaphore blocking the final asserts is released only after the 20000th of 50000 callbacks) * The test as a whole takes ~30s to run The first three are easy to fix (as part of fixing the second, I intend to hide all members of QuorumPeer behind getters and setters), the fourth and fifth need a slightly deeper understanding of what the test is trying to achieve. |
223665 | No Perforce job exists for this issue. | 0 | 32596 | 2 years, 31 weeks, 3 days ago | 0|i05xxj: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1359 | ZkCli create command data and acl parts should be optional. |
Bug | Resolved | Trivial | Duplicate | Unassigned | kavita sharma | kavita sharma | 10/Jan/12 03:46 | 01/Jul/13 17:37 | 16/Dec/12 01:50 | java client | 0 | 5 | In zkCli if we create a node without data then also node is getting created but if we will see in the commandMap it shows that {noformat} commandMap.put("create", "[-s] [-e] path data acl"); {noformat} that means data and acl parts are not optional .we need to change these parts as optional. |
new | 223378 | No Perforce job exists for this issue. | 0 | 32597 | 6 years, 38 weeks, 3 days ago | 0|i05xxr: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1358 | In StaticHostProviderTest.java, testNextDoesNotSleepForZero tests that hostProvider.next(0) doesn't sleep by checking that the latency of this call is less than 10sec |
Bug | Resolved | Trivial | Fixed | Alexander Shraer | Alexander Shraer | Alexander Shraer | 09/Jan/12 20:46 | 15/Jan/12 22:56 | 15/Jan/12 21:20 | 3.5.0 | 0 | 1 | should check for something smaller, perhaps 1ms or 5ms | 223356 | No Perforce job exists for this issue. | 2 | 32598 | 8 years, 10 weeks, 3 days ago | 0|i05xxz: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1357 | Zab1_0Test uses hard-wired port numbers. Specifically, it uses the same port for leader in two different tests. The second test periodically fails complaining that the port is still in use. |
Bug | Resolved | Minor | Fixed | Alexander Shraer | Alexander Shraer | Alexander Shraer | 09/Jan/12 18:04 | 14/Apr/14 18:31 | 14/Apr/14 17:53 | 3.5.0 | 3.5.0 | tests | 0 | 4 | Here's what I get: Testcase: testLeaderInConnectingFollowers took 34.117 sec Testcase: testLastAcceptedEpoch took 0.047 sec <----- new test added in ZK-1343 Testcase: testLeaderInElectingFollowers took 0.004 sec Caused an ERROR Address already in use java.net.BindException: Address already in use at java.net.PlainSocketImpl.socketBind(Native Method) at java.net.PlainSocketImpl.bind(PlainSocketImpl.java:383) at java.net.ServerSocket.bind(ServerSocket.java:328) at java.net.ServerSocket.<init>(ServerSocket.java:194) at java.net.ServerSocket.<init>(ServerSocket.java:106) at org.apache.zookeeper.server.quorum.Leader.<init>(Leader.java:220) at org.apache.zookeeper.server.quorum.Zab1_0Test.createLeader(Zab1_0Test.java:711) at org.apache.zookeeper.server.quorum.Zab1_0Test.testLeaderInElectingFollowers(Zab1_0Test.java:225) Testcase: testNormalFollowerRun took 29.128 sec Testcase: testNormalRun took 25.158 sec Testcase: testLeaderBehind took 25.148 sec Testcase: testAbandonBeforeACKEpoch took 34.029 sec My guess is that testLastAcceptedEpoch doesn't properly close the connection before testLeaderInElectingFollowers starts. I propose to add if (leadThread != null) { leadThread.interrupt(); leadThread.join(); } to the test. In addition, I propose to change the hard-wired ports in Zab1_0Test to use Portassignment.unique() as done in other tests. If I understand correctly the static counter used in unique() to assign ports is initialized once per test file, so it would also prevent the problem I'm seeing here of two tests in the same file trying to use the same port. The error can be reproduced using the attached patch (for some reason I don't see the problem in the trunk). |
223337 | No Perforce job exists for this issue. | 2 | 32599 | 5 years, 49 weeks, 3 days ago | 0|i05xy7: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1356 | Avoid permanent caching of server IPs in the client |
Bug | Resolved | Major | Duplicate | Neha Narkhede | Neha Narkhede | Neha Narkhede | 09/Jan/12 17:42 | 05/Feb/16 04:54 | 10/Jan/12 11:39 | 3.3.4, 3.4.2 | java client | 0 | 3 | Relevant conversation on the dev mailing list - https://email.corp.linkedin.com/owa/redir.aspx?C=87f3d1e78c96438c8115e450f410d010&URL=http%3a%2f%2fmarkmail.org%2fmessage%2f3vzynx6rgurubf3p%3fq%3dPerforming%2bno%2bdowntime%2bhardware%2bchanges%2bto%2ba%2blive%2bzookeeper%2bcluster%2blist%3aorg%252Eapache%252Ehadoop%252Ezookeeper-dev Basically, the client caches the list of server IPs internally and maintains that list for the entire lifetime of the client. This limits the ability to remove/change a server node from a zookeeper cluster, without having to restart every client. Also, two levels of IP caching, one in the JVM and one in the zookeeper client code seems unnecessar. It would be ideal to provide a config option that would turn off this IP caching in the client and re-resolve the host names during the reconnect. |
223333 | No Perforce job exists for this issue. | 0 | 32600 | 4 years, 6 weeks, 6 days ago | 0|i05xyf: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1355 | Add zk.updateServerList(newServerList) |
New Feature | Resolved | Major | Fixed | Alexander Shraer | Alexander Shraer | Alexander Shraer | 09/Jan/12 17:16 | 13/Feb/20 14:05 | 17/Nov/12 09:03 | 3.5.0 | c client, java client | 3 | 13 | ZOOKEEPER-1660, ZOOKEEPER-762, ZOOKEEPER-3726, ZOOKEEPER-1683, ZOOKEEPER-338, ZOOKEEPER-390, ZOOKEEPER-107 | When the set of servers changes, we would like to update the server list stored by clients without restarting the clients. Moreover, assuming that the number of clients per server is the same (in expectation) in the old configuration (as guaranteed by the current list shuffling for example), we would like to re-balance client connections across the new set of servers in a way that a) the number of clients per server is the same for all servers (in expectation) and b) there is no excessive/unnecessary client migration. It is simple to achieve (a) without (b) - just re-shuffle the new list of servers at every client. But this would create unnecessary migration, which we'd like to avoid. We propose a simple probabilistic migration scheme that achieves (a) and (b) - each client locally decides whether and where to migrate when the list of servers changes. The attached document describes the scheme and shows an evaluation of it in Zookeeper. We also implemented re-balancing through a consistent-hashing scheme and show a comparison. We derived the probabilistic migration rules from a simple formula that we can also provide, if someone's interested in the proof. |
223330 | No Perforce job exists for this issue. | 35 | 2598 | 7 years, 18 weeks, 5 days ago |
Reviewed
|
0|i00ssv: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1354 | AuthTest.testBadAuthThenSendOtherCommands fails intermittently |
Bug | Resolved | Major | Fixed | Patrick D. Hunt | Patrick D. Hunt | Patrick D. Hunt | 06/Jan/12 20:10 | 01/Mar/12 22:21 | 01/Mar/12 22:05 | 3.4.0 | 3.4.4, 3.5.0 | tests | 0 | 1 | I'm seeing the following intermittent failure: {noformat} junit.framework.AssertionFailedError: Should have called my watcher expected:<1> but was:<0> at org.apache.zookeeper.test.AuthTest.testBadAuthThenSendOtherCommands(AuthTest.java:89) at org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52) {noformat} The following commit introduced this test: bq. ZOOKEEPER-1152. Exceptions thrown from handleAuthentication can cause buffer corruption issues in NIOServer. (camille via breed) + Assert.assertEquals("Should have called my watcher", + 1, authFailed.get()); I think it's due to either a) the code is not waiting for the notification to be propagated, or 2) the message doesn't make it back from the server to the client prior to the socket or the clientcnxn being closed. What do you think, should I just wait for the notification to arrive? or do you think it's 2). ? |
223123 | No Perforce job exists for this issue. | 1 | 12513 | 8 years, 3 weeks, 6 days ago | 0|i02hz3: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1353 | C client test suite fails consistently |
Bug | Resolved | Minor | Fixed | Clint Byrum | Clint Byrum | Clint Byrum | 06/Jan/12 16:42 | 06/Feb/12 05:58 | 06/Feb/12 03:00 | 3.3.4 | 3.4.3, 3.3.5, 3.5.0 | c client, tests | 0 | 2 | 300 | 300 | 0% | Ubuntu precise (dev release), amd64 | When the c client test suite, zktest-mt, is run, it fails with this: tests/TestZookeeperInit.cc:233: Assertion: equality assertion failed [Expected: 2, Actual : 22] This was also reported in 3.3.1 here: http://www.mail-archive.com/zookeeper-dev@hadoop.apache.org/msg08914.html The C client tests are making some assumptions that are not valid. getaddrinfo may have, at one time, returned ENOENT instead of EINVAL for the host given in the test. The assertion should simply be that EINVAL | ENOENT are given, so that builds on platforms which return ENOENT for this are not broken. |
0% | 0% | 300 | 300 | patch, test | 223107 | No Perforce job exists for this issue. | 2 | 32601 | 8 years, 7 weeks, 3 days ago |
Reviewed
|
0|i05xyn: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1352 | server.InvalidSnapshotTest is using connection timeouts that are too short |
Bug | Resolved | Major | Fixed | Patrick D. Hunt | Patrick D. Hunt | Patrick D. Hunt | 05/Jan/12 14:12 | 06/Feb/12 05:58 | 06/Feb/12 03:52 | 3.3.4 | 3.4.3, 3.3.5, 3.5.0 | tests | 0 | 1 | InvalidSnapshotTest is using connection timeouts that are too short, see this false failure: https://builds.apache.org/job/ZooKeeper_branch33_solaris/65/testReport/junit/org.apache.zookeeper.server/InvalidSnapshotTest/testInvalidSnapshot/ {noformat} org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /invalidsnap-0 at org.apache.zookeeper.KeeperException.create(KeeperException.java:90) at org.apache.zookeeper.KeeperException.create(KeeperException.java:42) at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:643) at org.apache.zookeeper.server.InvalidSnapshotTest.testInvalidSnapshot(InvalidSnapshotTest.java:71) {noformat} Also in looking at the test itself it could use some cleanup (reuse features from ClientBase test utils) |
222894 | No Perforce job exists for this issue. | 4 | 32602 | 8 years, 7 weeks, 3 days ago |
Reviewed
|
0|i05xyv: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1351 | invalid test verification in MultiTransactionTest |
Bug | Resolved | Major | Fixed | Patrick D. Hunt | Patrick D. Hunt | Patrick D. Hunt | 04/Jan/12 17:55 | 15/Jan/12 22:56 | 15/Jan/12 21:35 | 3.4.0 | 3.4.3, 3.5.0 | tests | 0 | 1 | tests such as org.apache.zookeeper.test.MultiTransactionTest.testWatchesTriggered() are incorrect. Two issues I see 1) zk.sync is async, there is no guarantee that the watcher will be called subsequent to sync returning {noformat} zk.sync("/", null, null); assertTrue(watcher.triggered); /// incorrect assumption {noformat} The callback needs to be implemented, only once the callback is called can we verify the trigger. 2) trigger is not declared as volatile, even though it will be set in the context of a different thread (eventthread) See https://builds.apache.org/view/S-Z/view/ZooKeeper/job/ZooKeeper-trunk-solaris/91/testReport/junit/org.apache.zookeeper.test/MultiTransactionTest/testWatchesTriggered/ for an example of a false positive failure {noformat} junit.framework.AssertionFailedError at org.apache.zookeeper.test.MultiTransactionTest.testWatchesTriggered(MultiTransactionTest.java:236) at org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52) {noformat} |
222765 | No Perforce job exists for this issue. | 2 | 32603 | 8 years, 10 weeks, 3 days ago | 0|i05xz3: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1350 | Please make JMX registration optional in LearnerZooKeeperServer |
Improvement | Patch Available | Major | Unresolved | Jordan Zimmerman | Jordan Zimmerman | Jordan Zimmerman | 03/Jan/12 17:36 | 05/Feb/20 07:11 | 3.4.0 | 3.7.0, 3.5.8 | server | 4 | 5 | LearnerZooKeeperServer has no option to disable JMX registrations. Curator has a test ZK server cluster. Due to the intricacies of JMX, the registrations cannot be easily undone. In order for the Curator Test cluster to be re-usable in a testing session, JavaAssist ugliness was necessary to make LearnerZooKeeperServer.registerJMX() and LearnerZooKeeperServer.unregisterJMX() NOPs. I suggest a simple System property. |
222617 | No Perforce job exists for this issue. | 4 | 2510 | 1 year, 45 weeks ago | 0|i00s9b: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1349 | Support starting zkCli.sh in readonly mode |
Improvement | Resolved | Major | Duplicate | Rakesh Radhakrishnan | Rakesh Radhakrishnan | Rakesh Radhakrishnan | 02/Jan/12 10:10 | 11/Mar/14 04:35 | 11/Mar/14 04:35 | 3.4.0 | java client | 1 | 3 | ZOOKEEPER-784 | Start the .zkCli.sh in readonly mode. ZooKeeper client is supporting the readonly mode, it would be desirable the admin shell is providing the same support and can be able to see the status. Suggestion:- Add one more parameter as follows specifying the r-o mode. ./zkCli.sh -server 10.18.52.144:2179:readonly |
222497 | No Perforce job exists for this issue. | 0 | 41990 | 6 years, 2 weeks, 2 days ago | 0|i07jw7: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1348 | Zookeeper 3.4.2 C client incorrectly reports string version of 3.4.1 |
Bug | Resolved | Major | Fixed | Mahadev Konar | Marshall McMullen | Marshall McMullen | 30/Dec/11 21:58 | 06/Feb/12 03:20 | 06/Feb/12 03:20 | 3.4.2 | 3.4.3 | c client | 0 | 1 | When running the 3.4.2 C client, it shows the following output: Client environment:zookeeper.version=zookeeper C client 3.4.1 This should show "3.4.2" not "3.4.1". The problem looks to be caused by stale autoconf files in the C directory. grep -R "zookeeper C client 3.4.1" * autom4te.cache/output.0:@%:@ Generated by GNU Autoconf 2.59 for zookeeper C client 3.4.1. autom4te.cache/output.0:PACKAGE_STRING='zookeeper C client 3.4.1' autom4te.cache/output.0:\`configure' configures zookeeper C client 3.4.1 to adapt to many kinds of systems. autom4te.cache/output.0: short | recursive ) echo "Configuration of zookeeper C client 3.4.1:";; autom4te.cache/output.1:@%:@ Generated by GNU Autoconf 2.59 for zookeeper C client 3.4.1. autom4te.cache/output.1:PACKAGE_STRING='zookeeper C client 3.4.1' autom4te.cache/output.1:\`configure' configures zookeeper C client 3.4.1 to adapt to many kinds of systems. autom4te.cache/output.1: short | recursive ) echo "Configuration of zookeeper C client 3.4.1:";; config.h:#define PACKAGE_STRING "zookeeper C client 3.4.1" config.log:| #define PACKAGE_STRING "zookeeper C client 3.4.1" config.log:| #define PACKAGE_STRING "zookeeper C client 3.4.1" config.log:| #define PACKAGE_STRING "zookeeper C client 3.4.1" config.log:| #define PACKAGE_STRING "zookeeper C client 3.4.1" config.log:| #define PACKAGE_STRING "zookeeper C client 3.4.1" config.log:| #define PACKAGE_STRING "zookeeper C client 3.4.1" config.log:| #define PACKAGE_STRING "zookeeper C client 3.4.1" config.log:PACKAGE_STRING='zookeeper C client 3.4.1' config.log:#define PACKAGE_STRING "zookeeper C client 3.4.1" config.status:s,@PACKAGE_STRING@,zookeeper C client 3.4.1,;t t config.status:${ac_dA}PACKAGE_STRING${ac_dB}PACKAGE_STRING${ac_dC}"zookeeper C client 3.4.1"${ac_dD} config.status:${ac_uA}PACKAGE_STRING${ac_uB}PACKAGE_STRING${ac_uC}"zookeeper C client 3.4.1"${ac_uD} configure:# Generated by GNU Autoconf 2.59 for zookeeper C client 3.4.1. configure:PACKAGE_STRING='zookeeper C client 3.4.1' configure:\`configure' configures zookeeper C client 3.4.1 to adapt to many kinds of systems. configure: short | recursive ) echo "Configuration of zookeeper C client 3.4.1:";; Binary file libzkmt_la-zookeeper.o matches Makefile:PACKAGE_STRING = zookeeper C client 3.4.1 |
222399 | No Perforce job exists for this issue. | 0 | 32604 | 8 years, 7 weeks, 3 days ago | 0|i05xzb: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1347 | ZOOKEEPER-1346 Fix the cnxns to use a concurrent data structures |
Sub-task | Open | Major | Unresolved | Unassigned | Camille Fournier | Camille Fournier | 29/Dec/11 18:09 | 05/Feb/20 07:16 | 3.7.0, 3.5.8 | server | 0 | 3 | ZOOKEEPER-1504 | Cnxns is currently stored as a HashSet but may be accessed by multiple threads concurrently. Instead of doing our own sync we should investigate using a proper concurrent data structure for this. | 222307 | No Perforce job exists for this issue. | 0 | 41991 | 5 years, 37 weeks, 5 days ago | 0|i07jwf: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1346 | Add Jetty HTTP server support for four letter words. |
Improvement | Resolved | Major | Fixed | Bill Havanki | Camille Fournier | Camille Fournier | 29/Dec/11 18:07 | 11/May/17 21:48 | 17/Jul/14 20:20 | 3.5.0 | server | 1 | 15 | ZOOKEEPER-1347 | ZOOKEEPER-1197, ZOOKEEPER-737, ZOOKEEPER-1729, ZOOKEEPER-1968 | Move the 4lws to their own port, off of the client port, and support them properly via long-lived sessions instead of polling. Deprecate the 4lw support on the client port. Will enable us to enhance the functionality of the commands via extended command syntax, address security concerns and fix bugs involving the socket close being received before all of the data on the client end. | 222305 | No Perforce job exists for this issue. | 10 | 4493 | 2 years, 44 weeks, 6 days ago |
Reviewed
|
0|i014gv: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1345 | Add a .gitignore file with general exclusions and Eclipse project files excluded |
Improvement | Resolved | Trivial | Fixed | Harsh J | Harsh J | Harsh J | 29/Dec/11 11:29 | 31/Dec/11 05:57 | 30/Dec/11 16:36 | 3.5.0 | 3.4.3, 3.3.5, 3.5.0 | build | 0 | 1 | I tried looking for an .gitignore file in the ZK sources but I could not find one. Preferably, we could add one with the following: {code} # .classpath # .eclipse/ # .project # .revision/ # .settings/ # build/ # src/c/generated/ # src/java/generated/ # src/java/lib/ant-eclipse-1.0-jvm1.2.jar # src/java/lib/ivy-2.2.0.jar {code} To avoid losing much when doing "git clean -fd" and the likes while cleaning up the working repo dirs during development. This will aid those who use git mirrors for contributions a lot. |
222275 | No Perforce job exists for this issue. | 1 | 33286 | 8 years, 12 weeks, 5 days ago |
Reviewed
|
0|i0626v: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1344 | ZooKeeper client multi-update command is not considering the Chroot request |
Bug | Resolved | Critical | Fixed | Rakesh Radhakrishnan | Rakesh Radhakrishnan | Rakesh Radhakrishnan | 26/Dec/11 08:36 | 18/Mar/12 00:54 | 16/Mar/12 20:32 | 3.4.0 | 3.4.4, 3.5.0 | java client | 0 | 3 | For example: I have created a ZooKeeper client with subtree as "10.18.52.144:2179/apps/X". Now just generated OP command for the creation of zNode "/myId". When the client creates the path "/myid", the ZooKeeper server is actually be creating the path as "/myid" instead of creating as "/apps/X/myid" Expected output: zNode has to be created as "/apps/X/myid" |
222059 | No Perforce job exists for this issue. | 5 | 32605 | 8 years, 1 week, 4 days ago |
Incompatible change
|
0|i05xzj: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1343 | getEpochToPropose should check if lastAcceptedEpoch is greater or equal than epoch |
Bug | Resolved | Critical | Fixed | Flavio Paiva Junqueira | Flavio Paiva Junqueira | Flavio Paiva Junqueira | 23/Dec/11 07:44 | 09/Jan/12 17:50 | 03/Jan/12 19:28 | 3.4.0 | 3.4.3, 3.5.0 | 0 | 1 | The following block in Leader.getEpochToPropose: {noformat} if (lastAcceptedEpoch > epoch) { epoch = lastAcceptedEpoch+1; } {noformat} needs to be fixed, since it doesn't increment the epoch variable in the case epoch != -1 (initial value) and lastAcceptedEpoch is equal. The fix trivial and corresponds to changing > with >=. |
221962 | No Perforce job exists for this issue. | 4 | 32606 | 8 years, 11 weeks, 3 days ago |
Reviewed
|
0|i05xzr: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1342 | quorum Listener & LearnerCnxAcceptor are missing thread names |
Improvement | Resolved | Minor | Fixed | Rakesh Radhakrishnan | Rakesh Radhakrishnan | Rakesh Radhakrishnan | 23/Dec/11 00:15 | 22/Apr/13 16:02 | 27/Dec/11 19:29 | 3.5.0 | quorum | 0 | 2 | derby_triage10_5_2 | 221922 | No Perforce job exists for this issue. | 2 | 33287 | 8 years, 13 weeks, 1 day ago |
Reviewed
|
0|i06273: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1341 | problem handling invalid multi op in processTxn |
Bug | Open | Major | Unresolved | Unassigned | Patrick D. Hunt | Patrick D. Hunt | 22/Dec/11 14:55 | 22/Dec/11 14:55 | 3.4.0 | server | 0 | 0 | The handling of an invalid multi op in org.apache.zookeeper.server.DataTree.processTxn(TxnHeader, Record) is unusual, looks wrong to me. In particular an IOException is thrown and then essentially ignored, it seems to me we should fail the operation properly instead. This will be more important if we add new op types going fwd. Use of assert is a bit suspect as well, however perhaps it's fine... not sure. (we don't explicitly turn on assertions in our tests so not sure how useful it is regardless) Also notice that the catch of IOException is ignoring the result. It seems to me that handling this exception should be localized to the multi block (separate it out to it's own method seems like a good idea). We should add a test for this case. |
221885 | No Perforce job exists for this issue. | 0 | 32607 | 8 years, 14 weeks ago | 0|i05xzz: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1340 | multi problem - typical user operations are generating ERROR level messages in the server |
Bug | Resolved | Major | Fixed | Patrick D. Hunt | Patrick D. Hunt | Patrick D. Hunt | 22/Dec/11 13:03 | 06/Feb/12 05:58 | 06/Feb/12 04:50 | 3.4.0 | 3.4.3, 3.5.0 | server | 0 | 1 | Multi operations run by users are generating ERROR level messages in the server log even though they are typical user level operations that are not in any way impacting the server, example: {noformat} 2011-12-22 09:55:06,538 [myid:] - ERROR [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor@545] - >>>> Got user-level KeeperException when processing sessionid:0x13466e9828c0000 type:multi cxid:0x3 zxid:0x2 txntype:2 reqpath:n/a Error Path:/nonexisting Error:KeeperErrorCode = NoNode for /nonexisting 2011-12-22 09:55:06,538 [myid:] - ERROR [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor@549] - >>>> ABORTING remaing MultiOp ops {noformat} This is misleading. We should demote these messages to INFO level at the highest. (this is what we do for other such user operations, e.g. nonode) |
221866 | No Perforce job exists for this issue. | 2 | 32608 | 8 years, 7 weeks, 3 days ago | Unwanted ERROR messages in the logs. | 0|i05y07: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1339 | C clien doesn't build with --enable-debug |
Bug | Resolved | Major | Fixed | Eric Liang | Jakub Lekstan | Jakub Lekstan | 22/Dec/11 05:00 | 08/May/12 14:04 | 08/May/12 12:39 | 3.4.1 | 3.3.6, 3.4.4, 3.5.0 | c client | 0 | 3 | Ubuntu 11.04 | When I'm trying to build 3.4.1 c client with --enable-debug switch I'm getting following error: {code} make all-am make[1]: Entering directory `/home/jlekstan/zookeeper-3.4.1/src/c' if /bin/bash ./libtool --tag=CC --mode=compile gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include -I./tests -I./generated -Wall -Werror -g -O0 -D_GNU_SOURCE -MT zookeeper.lo -MD -MP -MF ".deps/zookeeper.Tpo" -c -o zookeeper.lo `test -f 'src/zookeeper.c' || echo './'`src/zookeeper.c; \ then mv -f ".deps/zookeeper.Tpo" ".deps/zookeeper.Plo"; else rm -f ".deps/zookeeper.Tpo"; exit 1; fi mkdir .libs gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include -I./tests -I./generated -Wall -Werror -g -O0 -D_GNU_SOURCE -MT zookeeper.lo -MD -MP -MF .deps/zookeeper.Tpo -c src/zookeeper.c -fPIC -DPIC -o .libs/zookeeper.o gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include -I./tests -I./generated -Wall -Werror -g -O0 -D_GNU_SOURCE -MT zookeeper.lo -MD -MP -MF .deps/zookeeper.Tpo -c src/zookeeper.c -o zookeeper.o >/dev/null 2>&1 if /bin/bash ./libtool --tag=CC --mode=compile gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include -I./tests -I./generated -Wall -Werror -g -O0 -D_GNU_SOURCE -MT recordio.lo -MD -MP -MF ".deps/recordio.Tpo" -c -o recordio.lo `test -f 'src/recordio.c' || echo './'`src/recordio.c; \ then mv -f ".deps/recordio.Tpo" ".deps/recordio.Plo"; else rm -f ".deps/recordio.Tpo"; exit 1; fi gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include -I./tests -I./generated -Wall -Werror -g -O0 -D_GNU_SOURCE -MT recordio.lo -MD -MP -MF .deps/recordio.Tpo -c src/recordio.c -fPIC -DPIC -o .libs/recordio.o gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include -I./tests -I./generated -Wall -Werror -g -O0 -D_GNU_SOURCE -MT recordio.lo -MD -MP -MF .deps/recordio.Tpo -c src/recordio.c -o recordio.o >/dev/null 2>&1 if /bin/bash ./libtool --tag=CC --mode=compile gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include -I./tests -I./generated -Wall -Werror -g -O0 -D_GNU_SOURCE -MT zookeeper.jute.lo -MD -MP -MF ".deps/zookeeper.jute.Tpo" -c -o zookeeper.jute.lo `test -f 'generated/zookeeper.jute.c' || echo './'`generated/zookeeper.jute.c; \ then mv -f ".deps/zookeeper.jute.Tpo" ".deps/zookeeper.jute.Plo"; else rm -f ".deps/zookeeper.jute.Tpo"; exit 1; fi gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include -I./tests -I./generated -Wall -Werror -g -O0 -D_GNU_SOURCE -MT zookeeper.jute.lo -MD -MP -MF .deps/zookeeper.jute.Tpo -c generated/zookeeper.jute.c -fPIC -DPIC -o .libs/zookeeper.jute.o gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include -I./tests -I./generated -Wall -Werror -g -O0 -D_GNU_SOURCE -MT zookeeper.jute.lo -MD -MP -MF .deps/zookeeper.jute.Tpo -c generated/zookeeper.jute.c -o zookeeper.jute.o >/dev/null 2>&1 if /bin/bash ./libtool --tag=CC --mode=compile gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include -I./tests -I./generated -Wall -Werror -g -O0 -D_GNU_SOURCE -MT zk_log.lo -MD -MP -MF ".deps/zk_log.Tpo" -c -o zk_log.lo `test -f 'src/zk_log.c' || echo './'`src/zk_log.c; \ then mv -f ".deps/zk_log.Tpo" ".deps/zk_log.Plo"; else rm -f ".deps/zk_log.Tpo"; exit 1; fi gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include -I./tests -I./generated -Wall -Werror -g -O0 -D_GNU_SOURCE -MT zk_log.lo -MD -MP -MF .deps/zk_log.Tpo -c src/zk_log.c -fPIC -DPIC -o .libs/zk_log.o gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include -I./tests -I./generated -Wall -Werror -g -O0 -D_GNU_SOURCE -MT zk_log.lo -MD -MP -MF .deps/zk_log.Tpo -c src/zk_log.c -o zk_log.o >/dev/null 2>&1 if /bin/bash ./libtool --tag=CC --mode=compile gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include -I./tests -I./generated -Wall -Werror -g -O0 -D_GNU_SOURCE -MT zk_hashtable.lo -MD -MP -MF ".deps/zk_hashtable.Tpo" -c -o zk_hashtable.lo `test -f 'src/zk_hashtable.c' || echo './'`src/zk_hashtable.c; \ then mv -f ".deps/zk_hashtable.Tpo" ".deps/zk_hashtable.Plo"; else rm -f ".deps/zk_hashtable.Tpo"; exit 1; fi gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include -I./tests -I./generated -Wall -Werror -g -O0 -D_GNU_SOURCE -MT zk_hashtable.lo -MD -MP -MF .deps/zk_hashtable.Tpo -c src/zk_hashtable.c -fPIC -DPIC -o .libs/zk_hashtable.o gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include -I./tests -I./generated -Wall -Werror -g -O0 -D_GNU_SOURCE -MT zk_hashtable.lo -MD -MP -MF .deps/zk_hashtable.Tpo -c src/zk_hashtable.c -o zk_hashtable.o >/dev/null 2>&1 if /bin/bash ./libtool --tag=CC --mode=compile gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include -I./tests -I./generated -Wall -Werror -g -O0 -D_GNU_SOURCE -MT st_adaptor.lo -MD -MP -MF ".deps/st_adaptor.Tpo" -c -o st_adaptor.lo `test -f 'src/st_adaptor.c' || echo './'`src/st_adaptor.c; \ then mv -f ".deps/st_adaptor.Tpo" ".deps/st_adaptor.Plo"; else rm -f ".deps/st_adaptor.Tpo"; exit 1; fi gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include -I./tests -I./generated -Wall -Werror -g -O0 -D_GNU_SOURCE -MT st_adaptor.lo -MD -MP -MF .deps/st_adaptor.Tpo -c src/st_adaptor.c -fPIC -DPIC -o .libs/st_adaptor.o gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include -I./tests -I./generated -Wall -Werror -g -O0 -D_GNU_SOURCE -MT st_adaptor.lo -MD -MP -MF .deps/st_adaptor.Tpo -c src/st_adaptor.c -o st_adaptor.o >/dev/null 2>&1 /bin/bash ./libtool --tag=CC --mode=link gcc -Wall -Werror -g -O0 -D_GNU_SOURCE -o libzkst.la zookeeper.lo recordio.lo zookeeper.jute.lo zk_log.lo zk_hashtable.lo st_adaptor.lo -lm ar cru .libs/libzkst.a .libs/zookeeper.o .libs/recordio.o .libs/zookeeper.jute.o .libs/zk_log.o .libs/zk_hashtable.o .libs/st_adaptor.o ranlib .libs/libzkst.a creating libzkst.la (cd .libs && rm -f libzkst.la && ln -s ../libzkst.la libzkst.la) if /bin/bash ./libtool --tag=CC --mode=compile gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include -I./tests -I./generated -Wall -Werror -g -O0 -D_GNU_SOURCE -MT hashtable_itr.lo -MD -MP -MF ".deps/hashtable_itr.Tpo" -c -o hashtable_itr.lo `test -f 'src/hashtable/hashtable_itr.c' || echo './'`src/hashtable/hashtable_itr.c; \ then mv -f ".deps/hashtable_itr.Tpo" ".deps/hashtable_itr.Plo"; else rm -f ".deps/hashtable_itr.Tpo"; exit 1; fi gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include -I./tests -I./generated -Wall -Werror -g -O0 -D_GNU_SOURCE -MT hashtable_itr.lo -MD -MP -MF .deps/hashtable_itr.Tpo -c src/hashtable/hashtable_itr.c -fPIC -DPIC -o .libs/hashtable_itr.o gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include -I./tests -I./generated -Wall -Werror -g -O0 -D_GNU_SOURCE -MT hashtable_itr.lo -MD -MP -MF .deps/hashtable_itr.Tpo -c src/hashtable/hashtable_itr.c -o hashtable_itr.o >/dev/null 2>&1 if /bin/bash ./libtool --tag=CC --mode=compile gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include -I./tests -I./generated -Wall -Werror -g -O0 -D_GNU_SOURCE -MT hashtable.lo -MD -MP -MF ".deps/hashtable.Tpo" -c -o hashtable.lo `test -f 'src/hashtable/hashtable.c' || echo './'`src/hashtable/hashtable.c; \ then mv -f ".deps/hashtable.Tpo" ".deps/hashtable.Plo"; else rm -f ".deps/hashtable.Tpo"; exit 1; fi gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include -I./tests -I./generated -Wall -Werror -g -O0 -D_GNU_SOURCE -MT hashtable.lo -MD -MP -MF .deps/hashtable.Tpo -c src/hashtable/hashtable.c -fPIC -DPIC -o .libs/hashtable.o gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include -I./tests -I./generated -Wall -Werror -g -O0 -D_GNU_SOURCE -MT hashtable.lo -MD -MP -MF .deps/hashtable.Tpo -c src/hashtable/hashtable.c -o hashtable.o >/dev/null 2>&1 /bin/bash ./libtool --tag=CC --mode=link gcc -Wall -Werror -g -O0 -D_GNU_SOURCE -o libhashtable.la hashtable_itr.lo hashtable.lo ar cru .libs/libhashtable.a .libs/hashtable_itr.o .libs/hashtable.o ranlib .libs/libhashtable.a creating libhashtable.la (cd .libs && rm -f libhashtable.la && ln -s ../libhashtable.la libhashtable.la) /bin/bash ./libtool --tag=CC --mode=link gcc -Wall -Werror -g -O0 -D_GNU_SOURCE -o libzookeeper_st.la -rpath /usr/local/lib -no-undefined -version-info 2 -export-symbols-regex '(zoo_|zookeeper_|zhandle|Z|format_log_message|log_message|logLevel|deallocate_|zerror|is_unrecoverable)' libzkst.la libhashtable.la generating symbol list for `libzookeeper_st.la' /usr/bin/nm -B ./.libs/libzkst.a ./.libs/libhashtable.a | sed -n -e 's/^.*[ ]\([ABCDGIRSTW][ABCDGIRSTW]*\)[ ][ ]*\([_A-Za-z][_A-Za-z0-9]*\)$/\1 \2 \2/p' | /bin/sed 's/.* //' | sort | uniq > .libs/libzookeeper_st.exp grep -E -e "(zoo_|zookeeper_|zhandle|Z|format_log_message|log_message|logLevel|deallocate_|zerror|is_unrecoverable)" ".libs/libzookeeper_st.exp" > ".libs/libzookeeper_st.expT" mv -f ".libs/libzookeeper_st.expT" ".libs/libzookeeper_st.exp" echo "{ global:" > .libs/libzookeeper_st.ver cat .libs/libzookeeper_st.exp | sed -e "s/\(.*\)/\1;/" >> .libs/libzookeeper_st.ver echo "local: *; };" >> .libs/libzookeeper_st.ver gcc -shared -Wl,--whole-archive ./.libs/libzkst.a ./.libs/libhashtable.a -Wl,--no-whole-archive -lm -Wl,-soname -Wl,libzookeeper_st.so.2 -Wl,-version-script -Wl,.libs/libzookeeper_st.ver -o .libs/libzookeeper_st.so.2.0.0 (cd .libs && rm -f libzookeeper_st.so.2 && ln -s libzookeeper_st.so.2.0.0 libzookeeper_st.so.2) (cd .libs && rm -f libzookeeper_st.so && ln -s libzookeeper_st.so.2.0.0 libzookeeper_st.so) rm -fr .libs/libzookeeper_st.lax mkdir .libs/libzookeeper_st.lax rm -fr .libs/libzookeeper_st.lax/libzkst.a mkdir .libs/libzookeeper_st.lax/libzkst.a (cd .libs/libzookeeper_st.lax/libzkst.a && ar x /home/jlekstan/zookeeper-3.4.1/src/c/./.libs/libzkst.a) rm -fr .libs/libzookeeper_st.lax/libhashtable.a mkdir .libs/libzookeeper_st.lax/libhashtable.a (cd .libs/libzookeeper_st.lax/libhashtable.a && ar x /home/jlekstan/zookeeper-3.4.1/src/c/./.libs/libhashtable.a) ar cru .libs/libzookeeper_st.a .libs/libzookeeper_st.lax/libzkst.a/zookeeper.o .libs/libzookeeper_st.lax/libzkst.a/st_adaptor.o .libs/libzookeeper_st.lax/libzkst.a/recordio.o .libs/libzookeeper_st.lax/libzkst.a/zk_hashtable.o .libs/libzookeeper_st.lax/libzkst.a/zk_log.o .libs/libzookeeper_st.lax/libzkst.a/zookeeper.jute.o .libs/libzookeeper_st.lax/libhashtable.a/hashtable_itr.o .libs/libzookeeper_st.lax/libhashtable.a/hashtable.o ranlib .libs/libzookeeper_st.a rm -fr .libs/libzookeeper_st.lax creating libzookeeper_st.la (cd .libs && rm -f libzookeeper_st.la && ln -s ../libzookeeper_st.la libzookeeper_st.la) if /bin/bash ./libtool --tag=CC --mode=compile gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include -I./tests -I./generated -DTHREADED -g -O0 -D_GNU_SOURCE -MT libzkmt_la-zookeeper.lo -MD -MP -MF ".deps/libzkmt_la-zookeeper.Tpo" -c -o libzkmt_la-zookeeper.lo `test -f 'src/zookeeper.c' || echo './'`src/zookeeper.c; \ then mv -f ".deps/libzkmt_la-zookeeper.Tpo" ".deps/libzkmt_la-zookeeper.Plo"; else rm -f ".deps/libzkmt_la-zookeeper.Tpo"; exit 1; fi gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include -I./tests -I./generated -DTHREADED -g -O0 -D_GNU_SOURCE -MT libzkmt_la-zookeeper.lo -MD -MP -MF .deps/libzkmt_la-zookeeper.Tpo -c src/zookeeper.c -fPIC -DPIC -o .libs/libzkmt_la-zookeeper.o gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include -I./tests -I./generated -DTHREADED -g -O0 -D_GNU_SOURCE -MT libzkmt_la-zookeeper.lo -MD -MP -MF .deps/libzkmt_la-zookeeper.Tpo -c src/zookeeper.c -o libzkmt_la-zookeeper.o >/dev/null 2>&1 if /bin/bash ./libtool --tag=CC --mode=compile gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include -I./tests -I./generated -DTHREADED -g -O0 -D_GNU_SOURCE -MT libzkmt_la-recordio.lo -MD -MP -MF ".deps/libzkmt_la-recordio.Tpo" -c -o libzkmt_la-recordio.lo `test -f 'src/recordio.c' || echo './'`src/recordio.c; \ then mv -f ".deps/libzkmt_la-recordio.Tpo" ".deps/libzkmt_la-recordio.Plo"; else rm -f ".deps/libzkmt_la-recordio.Tpo"; exit 1; fi gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include -I./tests -I./generated -DTHREADED -g -O0 -D_GNU_SOURCE -MT libzkmt_la-recordio.lo -MD -MP -MF .deps/libzkmt_la-recordio.Tpo -c src/recordio.c -fPIC -DPIC -o .libs/libzkmt_la-recordio.o gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include -I./tests -I./generated -DTHREADED -g -O0 -D_GNU_SOURCE -MT libzkmt_la-recordio.lo -MD -MP -MF .deps/libzkmt_la-recordio.Tpo -c src/recordio.c -o libzkmt_la-recordio.o >/dev/null 2>&1 if /bin/bash ./libtool --tag=CC --mode=compile gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include -I./tests -I./generated -DTHREADED -g -O0 -D_GNU_SOURCE -MT libzkmt_la-zookeeper.jute.lo -MD -MP -MF ".deps/libzkmt_la-zookeeper.jute.Tpo" -c -o libzkmt_la-zookeeper.jute.lo `test -f 'generated/zookeeper.jute.c' || echo './'`generated/zookeeper.jute.c; \ then mv -f ".deps/libzkmt_la-zookeeper.jute.Tpo" ".deps/libzkmt_la-zookeeper.jute.Plo"; else rm -f ".deps/libzkmt_la-zookeeper.jute.Tpo"; exit 1; fi gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include -I./tests -I./generated -DTHREADED -g -O0 -D_GNU_SOURCE -MT libzkmt_la-zookeeper.jute.lo -MD -MP -MF .deps/libzkmt_la-zookeeper.jute.Tpo -c generated/zookeeper.jute.c -fPIC -DPIC -o .libs/libzkmt_la-zookeeper.jute.o gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include -I./tests -I./generated -DTHREADED -g -O0 -D_GNU_SOURCE -MT libzkmt_la-zookeeper.jute.lo -MD -MP -MF .deps/libzkmt_la-zookeeper.jute.Tpo -c generated/zookeeper.jute.c -o libzkmt_la-zookeeper.jute.o >/dev/null 2>&1 if /bin/bash ./libtool --tag=CC --mode=compile gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include -I./tests -I./generated -DTHREADED -g -O0 -D_GNU_SOURCE -MT libzkmt_la-zk_log.lo -MD -MP -MF ".deps/libzkmt_la-zk_log.Tpo" -c -o libzkmt_la-zk_log.lo `test -f 'src/zk_log.c' || echo './'`src/zk_log.c; \ then mv -f ".deps/libzkmt_la-zk_log.Tpo" ".deps/libzkmt_la-zk_log.Plo"; else rm -f ".deps/libzkmt_la-zk_log.Tpo"; exit 1; fi gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include -I./tests -I./generated -DTHREADED -g -O0 -D_GNU_SOURCE -MT libzkmt_la-zk_log.lo -MD -MP -MF .deps/libzkmt_la-zk_log.Tpo -c src/zk_log.c -fPIC -DPIC -o .libs/libzkmt_la-zk_log.o gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include -I./tests -I./generated -DTHREADED -g -O0 -D_GNU_SOURCE -MT libzkmt_la-zk_log.lo -MD -MP -MF .deps/libzkmt_la-zk_log.Tpo -c src/zk_log.c -o libzkmt_la-zk_log.o >/dev/null 2>&1 if /bin/bash ./libtool --tag=CC --mode=compile gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include -I./tests -I./generated -DTHREADED -g -O0 -D_GNU_SOURCE -MT libzkmt_la-zk_hashtable.lo -MD -MP -MF ".deps/libzkmt_la-zk_hashtable.Tpo" -c -o libzkmt_la-zk_hashtable.lo `test -f 'src/zk_hashtable.c' || echo './'`src/zk_hashtable.c; \ then mv -f ".deps/libzkmt_la-zk_hashtable.Tpo" ".deps/libzkmt_la-zk_hashtable.Plo"; else rm -f ".deps/libzkmt_la-zk_hashtable.Tpo"; exit 1; fi gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include -I./tests -I./generated -DTHREADED -g -O0 -D_GNU_SOURCE -MT libzkmt_la-zk_hashtable.lo -MD -MP -MF .deps/libzkmt_la-zk_hashtable.Tpo -c src/zk_hashtable.c -fPIC -DPIC -o .libs/libzkmt_la-zk_hashtable.o gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include -I./tests -I./generated -DTHREADED -g -O0 -D_GNU_SOURCE -MT libzkmt_la-zk_hashtable.lo -MD -MP -MF .deps/libzkmt_la-zk_hashtable.Tpo -c src/zk_hashtable.c -o libzkmt_la-zk_hashtable.o >/dev/null 2>&1 if /bin/bash ./libtool --tag=CC --mode=compile gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include -I./tests -I./generated -DTHREADED -g -O0 -D_GNU_SOURCE -MT libzkmt_la-mt_adaptor.lo -MD -MP -MF ".deps/libzkmt_la-mt_adaptor.Tpo" -c -o libzkmt_la-mt_adaptor.lo `test -f 'src/mt_adaptor.c' || echo './'`src/mt_adaptor.c; \ then mv -f ".deps/libzkmt_la-mt_adaptor.Tpo" ".deps/libzkmt_la-mt_adaptor.Plo"; else rm -f ".deps/libzkmt_la-mt_adaptor.Tpo"; exit 1; fi gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include -I./tests -I./generated -DTHREADED -g -O0 -D_GNU_SOURCE -MT libzkmt_la-mt_adaptor.lo -MD -MP -MF .deps/libzkmt_la-mt_adaptor.Tpo -c src/mt_adaptor.c -fPIC -DPIC -o .libs/libzkmt_la-mt_adaptor.o gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include -I./tests -I./generated -DTHREADED -g -O0 -D_GNU_SOURCE -MT libzkmt_la-mt_adaptor.lo -MD -MP -MF .deps/libzkmt_la-mt_adaptor.Tpo -c src/mt_adaptor.c -o libzkmt_la-mt_adaptor.o >/dev/null 2>&1 /bin/bash ./libtool --tag=CC --mode=link gcc -Wall -Werror -g -O0 -D_GNU_SOURCE -o libzkmt.la libzkmt_la-zookeeper.lo libzkmt_la-recordio.lo libzkmt_la-zookeeper.jute.lo libzkmt_la-zk_log.lo libzkmt_la-zk_hashtable.lo libzkmt_la-mt_adaptor.lo -lm ar cru .libs/libzkmt.a .libs/libzkmt_la-zookeeper.o .libs/libzkmt_la-recordio.o .libs/libzkmt_la-zookeeper.jute.o .libs/libzkmt_la-zk_log.o .libs/libzkmt_la-zk_hashtable.o .libs/libzkmt_la-mt_adaptor.o ranlib .libs/libzkmt.a creating libzkmt.la (cd .libs && rm -f libzkmt.la && ln -s ../libzkmt.la libzkmt.la) /bin/bash ./libtool --tag=CC --mode=link gcc -Wall -Werror -g -O0 -D_GNU_SOURCE -o libzookeeper_mt.la -rpath /usr/local/lib -no-undefined -version-info 2 -export-symbols-regex '(zoo_|zookeeper_|zhandle|Z|format_log_message|log_message|logLevel|deallocate_|zerror|is_unrecoverable)' libzkmt.la libhashtable.la -lpthread generating symbol list for `libzookeeper_mt.la' /usr/bin/nm -B ./.libs/libzkmt.a ./.libs/libhashtable.a | sed -n -e 's/^.*[ ]\([ABCDGIRSTW][ABCDGIRSTW]*\)[ ][ ]*\([_A-Za-z][_A-Za-z0-9]*\)$/\1 \2 \2/p' | /bin/sed 's/.* //' | sort | uniq > .libs/libzookeeper_mt.exp grep -E -e "(zoo_|zookeeper_|zhandle|Z|format_log_message|log_message|logLevel|deallocate_|zerror|is_unrecoverable)" ".libs/libzookeeper_mt.exp" > ".libs/libzookeeper_mt.expT" mv -f ".libs/libzookeeper_mt.expT" ".libs/libzookeeper_mt.exp" echo "{ global:" > .libs/libzookeeper_mt.ver cat .libs/libzookeeper_mt.exp | sed -e "s/\(.*\)/\1;/" >> .libs/libzookeeper_mt.ver echo "local: *; };" >> .libs/libzookeeper_mt.ver gcc -shared -Wl,--whole-archive ./.libs/libzkmt.a ./.libs/libhashtable.a -Wl,--no-whole-archive -lm -lpthread -Wl,-soname -Wl,libzookeeper_mt.so.2 -Wl,-version-script -Wl,.libs/libzookeeper_mt.ver -o .libs/libzookeeper_mt.so.2.0.0 (cd .libs && rm -f libzookeeper_mt.so.2 && ln -s libzookeeper_mt.so.2.0.0 libzookeeper_mt.so.2) (cd .libs && rm -f libzookeeper_mt.so && ln -s libzookeeper_mt.so.2.0.0 libzookeeper_mt.so) rm -fr .libs/libzookeeper_mt.lax mkdir .libs/libzookeeper_mt.lax rm -fr .libs/libzookeeper_mt.lax/libzkmt.a mkdir .libs/libzookeeper_mt.lax/libzkmt.a (cd .libs/libzookeeper_mt.lax/libzkmt.a && ar x /home/jlekstan/zookeeper-3.4.1/src/c/./.libs/libzkmt.a) rm -fr .libs/libzookeeper_mt.lax/libhashtable.a mkdir .libs/libzookeeper_mt.lax/libhashtable.a (cd .libs/libzookeeper_mt.lax/libhashtable.a && ar x /home/jlekstan/zookeeper-3.4.1/src/c/./.libs/libhashtable.a) ar cru .libs/libzookeeper_mt.a .libs/libzookeeper_mt.lax/libzkmt.a/libzkmt_la-zk_hashtable.o .libs/libzookeeper_mt.lax/libzkmt.a/libzkmt_la-zookeeper.o .libs/libzookeeper_mt.lax/libzkmt.a/libzkmt_la-zk_log.o .libs/libzookeeper_mt.lax/libzkmt.a/libzkmt_la-zookeeper.jute.o .libs/libzookeeper_mt.lax/libzkmt.a/libzkmt_la-recordio.o .libs/libzookeeper_mt.lax/libzkmt.a/libzkmt_la-mt_adaptor.o .libs/libzookeeper_mt.lax/libhashtable.a/hashtable_itr.o .libs/libzookeeper_mt.lax/libhashtable.a/hashtable.o ranlib .libs/libzookeeper_mt.a rm -fr .libs/libzookeeper_mt.lax creating libzookeeper_mt.la (cd .libs && rm -f libzookeeper_mt.la && ln -s ../libzookeeper_mt.la libzookeeper_mt.la) if gcc -DHAVE_CONFIG_H -I. -I. -I. -I./include -I./tests -I./generated -Wall -Werror -g -O0 -D_GNU_SOURCE -MT cli.o -MD -MP -MF ".deps/cli.Tpo" -c -o cli.o `test -f 'src/cli.c' || echo './'`src/cli.c; \ then mv -f ".deps/cli.Tpo" ".deps/cli.Po"; else rm -f ".deps/cli.Tpo"; exit 1; fi /bin/bash ./libtool --tag=CC --mode=link gcc -Wall -Werror -g -O0 -D_GNU_SOURCE -o cli_st cli.o libzookeeper_st.la gcc -Wall -Werror -g -O0 -D_GNU_SOURCE -o .libs/cli_st cli.o ./.libs/libzookeeper_st.so -lm ./.libs/libzookeeper_st.so: undefined reference to `hashtable_iterator_value' ./.libs/libzookeeper_st.so: undefined reference to `hashtable_iterator_key' collect2: ld returned 1 exit status make[1]: *** [cli_st] Error 1 make[1]: Leaving directory `/home/jlekstan/zookeeper-3.4.1/src/c' make: *** [all] Error 2 {code} |
221824 | No Perforce job exists for this issue. | 2 | 32609 | 7 years, 46 weeks, 2 days ago | 0|i05y0f: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1338 | class cast exceptions may be thrown by multi ErrorResult class (invalid equals) |
Bug | Resolved | Major | Fixed | Patrick D. Hunt | Patrick D. Hunt | Patrick D. Hunt | 21/Dec/11 19:30 | 06/Feb/12 05:58 | 06/Feb/12 05:16 | 3.4.0 | 3.4.3, 3.5.0 | java client | 0 | 1 | There's a bug in ErrorResult and perhaps some of the other OpResult equals methods in multi. | 221789 | No Perforce job exists for this issue. | 3 | 32610 | 8 years, 7 weeks, 3 days ago | 0|i05y0n: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1337 | multi's "Transaction" class is missing tests. |
Test | Resolved | Minor | Fixed | Patrick D. Hunt | Patrick D. Hunt | Patrick D. Hunt | 21/Dec/11 19:19 | 06/Feb/12 05:58 | 06/Feb/12 05:00 | 3.4.0 | 3.4.3, 3.5.0 | java client | 0 | 0 | Add tests for zookeeper client transaction() method. | 221787 | No Perforce job exists for this issue. | 2 | 33288 | 8 years, 7 weeks, 3 days ago |
Reviewed
|
0|i0627b: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1336 | javadoc for multi is confusing, references functionality that doesn't seem to exist |
Bug | Resolved | Major | Fixed | Patrick D. Hunt | Patrick D. Hunt | Patrick D. Hunt | 21/Dec/11 18:24 | 06/Feb/12 05:58 | 06/Feb/12 03:58 | 3.4.1 | 3.4.3, 3.5.0 | java client | 0 | 1 | There's this in org.apache.zookeeper.ZooKeeper.multi(Iterable<Op>) {noformat} * Executes multiple Zookeeper operations or none of them. On success, a list of results is returned. * On failure, only a single exception is returned. If you want more details, it may be preferable to * use the alternative form of this method that lets you pass a list into which individual results are * placed so that you can zero in on exactly which operation failed and why. {noformat} What is the "alternate form of this method" that's being referred to? Seems like we should add this functionality, or at the very least update the javadoc. (I don't think this is referring to Transaction, although the docs there are pretty thin) |
221782 | No Perforce job exists for this issue. | 1 | 32611 | 8 years, 7 weeks, 3 days ago | Improved Javadoc for multi API's. |
Reviewed
|
0|i05y0v: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1335 | Add support for --config to zkEnv.sh to specify a config directory different than what is expected |
Improvement | Resolved | Major | Fixed | Arpit Gupta | Arpit Gupta | Arpit Gupta | 20/Dec/11 16:56 | 17/Dec/12 06:04 | 17/Dec/12 01:11 | 3.5.0 | 0 | 2 | zkEnv.sh expects ZOOCFGDIR env variable set. If not it looks for the conf dir in the ZOOKEEPER_PREFIX dir or in /etc/zookeeper. It would be great if we can support --config option where at run time you could specify a different config directory. We do the same thing in hadoop. With this you should be able to do /usr/sbin/zkServer.sh --config /some/conf/dir start|stop |
221592 | No Perforce job exists for this issue. | 2 | 41992 | 7 years, 14 weeks, 3 days ago | 0|i07jwn: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1334 | Zookeeper 3.4.x is not OSGi compliant - MANIFEST.MF is flawed |
Bug | Closed | Major | Fixed | Claus Ibsen | Claus Ibsen | Claus Ibsen | 20/Dec/11 11:31 | 08/Oct/14 11:55 | 19/Dec/12 03:05 | 3.4.0 | 3.4.6, 3.5.0 | 3 | 12 | ZOOKEEPER-2056, ZOOKEEPER-1647, ZOOKEEPER-1078, CAMEL-4803 | In Zookeeper 3.3.x you use log4j for logging, and the maven dep is eg from 3.3.4 {code} <dependency> <groupId>log4j</groupId> <artifactId>log4j</artifactId> <version>1.2.15</version> <scope>compile</scope> </dependency> {code} Now in 3.4.0 or better you changed to use slf4j also/instead. The maven pom.xml now includes: {code} <dependency> <groupId>org.slf4j</groupId> <artifactId>slf4j-api</artifactId> <version>1.6.1</version> <scope>compile</scope> </dependency> <dependency> <groupId>org.slf4j</groupId> <artifactId>slf4j-log4j12</artifactId> <version>1.6.1</version> <scope>compile</scope> </dependency> <dependency> <groupId>log4j</groupId> <artifactId>log4j</artifactId> <version>1.2.15</version> <scope>compile</scope> </dependency> {code} But the META-INF/MANIFEST.MF file in the distribution did not change to reflect this. The 3.3.4 MANIFEST.MF, import packages {code} Import-Package: javax.management,org.apache.log4j,org.osgi.framework;v ersion="[1.4,2.0)",org.osgi.util.tracker;version="[1.1,2.0)" {code} And the 3.4.1 MANIFEST.MF, import packages: {code} Import-Package: javax.management,org.apache.log4j,org.osgi.framework;v ersion="[1.4,2.0)",org.osgi.util.tracker;version="[1.1,2.0)" {code} This makes using zookeeper 3.4.x in OSGi environments not possible, as we get NoClassDefFoundException for slf4j classes. |
221549 | No Perforce job exists for this issue. | 3 | 32612 | 6 years, 2 weeks ago |
Reviewed
|
0|i05y13: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1333 | NPE in FileTxnSnapLog when restarting a cluster |
Bug | Closed | Blocker | Fixed | Patrick D. Hunt | Andrew McNair | Andrew McNair | 19/Dec/11 20:24 | 29/Dec/11 18:46 | 21/Dec/11 15:40 | 3.4.0 | 3.4.2, 3.5.0 | server | 0 | 7 | I think a NPE was created in the fix for https://issues.apache.org/jira/browse/ZOOKEEPER-1269 Looking in DataTree.processTxn(TxnHeader header, Record txn) it seems likely that if rc.err != Code.OK then rc.path will be null. I'm currently working on a minimal test case for the bug, I'll attach it to this issue when it's ready. java.lang.NullPointerException at org.apache.zookeeper.server.persistence.FileTxnSnapLog.processTransaction(FileTxnSnapLog.java:203) at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:150) at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:418) at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:410) at org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:151) at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:111) at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) |
221456 | No Perforce job exists for this issue. | 7 | 32613 | 8 years, 14 weeks ago |
Reviewed
|
0|i05y1b: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1332 | Zookeeper data is not in sync with quorum in the mentioned scenario |
Bug | Resolved | Major | Duplicate | Unassigned | amith | amith | 19/Dec/11 04:32 | 19/Dec/11 07:03 | 19/Dec/11 07:03 | 3.4.0 | 3.4.1 | server | 0 | 0 | 3 zookeeper quorum | Please check the below mentioned scenario:- 1. Configure 3 zookeeper servers in quorum 2. Start zk1 (F) and zk2(L) from a java client create a node(client connect to zk2) 3. Stop the zk2 (L) 4. Start the zk3, Now FLE is successful but zookeeper-3 is not having the node created In step 4 Zookeeper-3 is getting a diff from the leader 2011-12-19 20:15:59,379 [myid:3] - INFO [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2183:Environment@98] - Server environment:user.home=/root 2011-12-19 20:15:59,379 [myid:3] - INFO [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2183:Environment@98] - Server environment:user.dir=/home/amith/OpenSrc/zookeeper/zookeeper3/bin 2011-12-19 20:15:59,381 [myid:3] - INFO [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2183:ZooKeeperServer@168] - Created server with tickTime 2000 minSessionTimeout 4000 maxSessionTimeout 40000 datadir ../dataDir/version-2 snapdir ../dataDir/version-2 2011-12-19 20:15:59,382 [myid:3] - INFO [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2183:Follower@63] - FOLLOWING - LEADER ELECTION TOOK - 102 2011-12-19 20:15:59,403 [myid:3] - INFO [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2183:Learner@322] - Getting a diff from the leader 0x10000000a 2011-12-19 20:15:59,449 [myid:3] - WARN [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2183:Learner@372] - Got zxid 0x10000000a expected 0x1 2011-12-19 20:15:59,450 [myid:3] - INFO [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2183:FileTxnSnapLog@255] - Snapshotting: 10000000a but in the diff all the required data is not obtained ...! Here I think zookeeper-3 should get snapshot from leader and not Diff |
221357 | No Perforce job exists for this issue. | 0 | 32614 | 8 years, 14 weeks, 3 days ago | 0|i05y1j: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1331 | Typo in docs: acheive -> achieve |
Bug | Resolved | Minor | Fixed | Andrew Ash | Andrew Ash | Andrew Ash | 19/Dec/11 02:10 | 28/Dec/11 05:58 | 27/Dec/11 19:39 | 3.2.2 | 3.5.0 | documentation | 0 | 1 | Found this typo while reading docs. Attaching SVN patch | 221343 | No Perforce job exists for this issue. | 3 | 32615 | 8 years, 13 weeks, 1 day ago |
Reviewed
|
0|i05y1r: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1330 | Zookeeper server not serving the client request even after completion of Leader election |
Bug | Open | Minor | Unresolved | Unassigned | amith | amith | 19/Dec/11 00:21 | 05/Feb/20 07:17 | 3.4.0 | 3.7.0, 3.5.8 | server | 0 | 8 | 3 zk quorum | Have a cluster of 3 zookeepers 90 clients are connected to the server leader got killed and started the other 2 zookeeper started FLE and Leader was elected But its taking nearly 10 sec for this server to server requests and saying "ZooKeeperServer not running" message..? Why is this even after Leader election SERVER IS NOT RUNNING !!!!!!!!!! 2011-12-19 16:12:29,732 [myid:2] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2182:NIOServerCnxn@354] - Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running 2011-12-19 16:12:29,733 [myid:2] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2182:NIOServerCnxn@1000] - Closed socket connection for client /10.18.47.148:51965 (no session established for client) 2011-12-19 16:12:29,753 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2182:QuorumPeer@747] - LEADING 2011-12-19 16:12:29,762 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2182:Leader@58] - TCP NoDelay set to: true 2011-12-19 16:12:29,765 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2182:ZooKeeperServer@168] - Created server with tickTime 2000 minSessionTimeout 4000 maxSessionTimeout 40000 datadir ../dataDir/version-2 snapdir ../dataDir/version-2 2011-12-19 16:12:29,766 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2182:Leader@294] - LEADING - LEADER ELECTION TOOK - 4663 2011-12-19 16:12:29,776 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2182:FileSnap@83] - Reading snapshot ../dataDir/version-2/snapshot.100013661 2011-12-19 16:12:29,831 [myid:2] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2182:NIOServerCnxnFactory@213] - Accepted socket connection from /10.18.47.148:51982 2011-12-19 16:12:29,831 [myid:2] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2182:NIOServerCnxn@354] - Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running 2011-12-19 16:12:29,832 [myid:2] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2182:NIOServerCnxn@1000] - Closed socket connection for client /10.18.47.148:51982 (no session established for client) 2011-12-19 16:12:29,884 [myid:2] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2182:NIOServerCnxnFactory@213] - Accepted socket connection from /10.18.47.148:51989 2011-12-19 16:12:29,884 [myid:2] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2182:NIOServerCnxn@354] - Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running |
221339 | No Perforce job exists for this issue. | 0 | 32616 | 22 weeks, 3 days ago | 0|i05y1z: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1329 | Lock recipe sorts sequenced children incorrectly |
Bug | Open | Major | Unresolved | Unassigned | Evan McClure | Evan McClure | 15/Dec/11 19:04 | 15/Dec/11 19:05 | 3.3.3 | recipes | 1 | 2 | Mac OS X Version 10.6.8 Darwin emcclure-lt-mac.local 10.8.0 Darwin Kernel Version 10.8.0: Tue Jun 7 16:33:36 PDT 2011; root:xnu-1504.15.3~1/RELEASE_I386 i386 Homebrew 0.8 |
The lock recipe sorts sequenced children lexicographically. When the sequence number wraps, a lexicographical comparison will always place 2147483648 ahead of -2147483649, place -2147483648 ahead of -2147483649, and place -1 ahead of -2. Clearly, we want 2147483648 < -2147483649, -2147483649 < -2147483648, and -2 placed ahead of -1, since those sequence numbers were generated in that order. I suggest that the sequence numbers be converted to unsigned numbers before being compared in the comparison functor that gets passed to qsort(). This leaves us with another issue. When comparing unsigned sequence numbers, there is a slim chance that 4294967296 < 0. So, I suggest that a fudge range be used, say, the number of nodes in the quorum * some fudge factor, in order to handle this comparison. Please close this if I'm way off base here. |
221077 | No Perforce job exists for this issue. | 0 | 32617 | 8 years, 15 weeks ago | 0|i05y27: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1328 | Misplaced assertion for the test case 'FLELostMessageTest' and not identifying misfunctions |
Test | Resolved | Major | Fixed | Rakesh Radhakrishnan | Rakesh Radhakrishnan | Rakesh Radhakrishnan | 13/Dec/11 00:46 | 03/Sep/12 07:01 | 03/Sep/12 01:59 | 3.4.0 | 3.5.0 | leaderElection | 0 | 5 | Assertion for testLostMessage is kept inside the thread.run() method. Due to this the assertion failure will not be reflected in the main testcase. I have observed the test case is still passing in case of the assert failure or misfunction. Instead, the assertion can be moved to the test case - testLostMessage. {noformat} class LEThread extends Thread { public void run(){ peer.setCurrentVote(v); LOG.info("Finished election: " + i + ", " + v.getId()); Assert.assertTrue("State is not leading.", peer.getPeerState() == ServerState.LEADING); } {noformat} |
220589 | No Perforce job exists for this issue. | 3 | 33289 | 7 years, 29 weeks, 3 days ago |
Reviewed
|
0|i0627j: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1327 | there are still remnants of hadoop urls |
Bug | Resolved | Major | Fixed | Harsh J | Benjamin Reed | Benjamin Reed | 12/Dec/11 23:25 | 06/Feb/12 05:58 | 06/Feb/12 04:17 | 3.4.3, 3.5.0 | 0 | 2 | there are still hadoop urls and references to zookeeper lists under the hadoop project in the sources. | 220587 | No Perforce job exists for this issue. | 3 | 32618 | 8 years, 7 weeks, 3 days ago | Remove links to Hadoop wiki's in ZooKeeper docs. |
Reviewed
|
0|i05y2f: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1326 | The CLI commands "delete" and "rmr" are confusing. Can we have "delete" + "deleteall" instead? |
Wish | Resolved | Trivial | Fixed | Harsh J | Harsh J | Harsh J | 11/Dec/11 14:56 | 28/Dec/11 11:13 | 27/Dec/11 18:53 | 3.4.0 | 3.5.0 | java client | 0 | 2 | ZOOKEEPER-729 | ZOOKEEPER-729 introduced 'rmr' for recursive 'delete' operations on a given node. Going by the unix convention, wouldn't it be much better if we were to have an 'rm' if there was an 'rmr' added? The current set is confusing. Or should we have 'delete' and 'deleteall' or summat? I know this is a nitpick, but I just dislike to see bad keywords used for commands. I'm OK to produce a backwards-compatible patch if this is acceptable. |
220398 | No Perforce job exists for this issue. | 2 | 33290 | 8 years, 13 weeks, 1 day ago |
Reviewed
|
0|i0627r: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1325 | Log maxClientCnxn warning in INFO level |
Improvement | Resolved | Minor | Invalid | Unassigned | Mubarak Seyed | Mubarak Seyed | 09/Dec/11 17:09 | 09/Dec/11 18:36 | 09/Dec/11 17:30 | 3.3.3, 3.3.4, 3.4.0 | server | 0 | 0 | When Hbase client ZooKeeperWatcher gets ConnectionLossException (/hbase/rs or /hbase), it is very hard debug the ZK server log if ZK server has started using log4j INFO level. When maxClientCnxn limit is reached for a single client (at the socket level), it will be nice to log it in INFO level instead of WARN. It will help hbase clients (Region server, HMaster, and HBase cient lib) to debug the issue in production. {code} 3.4 - src/java/main/org/apache/zookeeper/server/NIOServerCnxnFactory.java 3.3.4 - src/java/main/org/apache/zookeeper/server/NIOServerCnxn.java public void run() { while (!ss.socket().isClosed()) { try { ... ... if (maxClientCnxns > 0 && cnxncount >= maxClientCnxns){ LOG.info("Too many connections from " + ia + " - max is " + maxClientCnxns ); sc.close(); } ... } {code} |
noob | 220275 | No Perforce job exists for this issue. | 0 | 33291 | 8 years, 15 weeks, 6 days ago | 0|i0627z: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1324 | Remove Duplicate NEWLEADER packets from the Leader to the Follower. |
Improvement | Closed | Critical | Fixed | Flavio Paiva Junqueira | Mahadev Konar | Mahadev Konar | 09/Dec/11 13:39 | 13/Mar/14 14:17 | 14/May/13 08:29 | 3.5.0 | 3.4.6, 3.5.0 | quorum | 0 | 10 | ZOOKEEPER-1694, ZOOKEEPER-1697, ZOOKEEPER-107 | 220241 | No Perforce job exists for this issue. | 11 | 32619 | 6 years, 2 weeks ago | 0|i05y2n: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1323 | c client doesn't compile on freebsd |
Bug | Closed | Major | Fixed | Michi Mutsuzaki | Michi Mutsuzaki | Michi Mutsuzaki | 08/Dec/11 20:28 | 29/Dec/11 18:46 | 14/Dec/11 18:19 | 3.4.0 | 3.4.2, 3.5.0 | c client | 0 | 1 | freebsd 6.4 | EAI_NODATA and EAI_ADDRFAMILY have been deprecated in FreeBSD. I'm getting this error: src/zookeeper.c: In function `getaddrinfo_errno': src/zookeeper.c:446: error: `EAI_NODATA' undeclared (first use in this function) src/zookeeper.c:446: error: (Each undeclared identifier is reported only once src/zookeeper.c:446: error: for each function it appears in.) src/zookeeper.c: In function `getaddrs': src/zookeeper.c:581: error: `EAI_ADDRFAMILY' undeclared (first use in this function) I'll submit a patch. --Michi |
220141 | No Perforce job exists for this issue. | 1 | 32620 | 8 years, 15 weeks ago |
Reviewed
|
0|i05y2v: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1322 | Cleanup/fix logging in Quorum code. |
Improvement | Resolved | Major | Fixed | Patrick D. Hunt | Patrick D. Hunt | Patrick D. Hunt | 07/Dec/11 19:47 | 06/Feb/12 05:58 | 06/Feb/12 03:44 | 3.4.0, 3.5.0 | 3.4.3, 3.5.0 | server | 0 | 0 | While triaging ZOOKEEPER-1319 I updated the code with the attached patch in order to help debug what was going on with that issue. I think it would be useful to include these changes in the project itself. ff to include in 3.4.1 or push to 3.5.0. You should verify this with TRACE logging turned on in addition to INFO (default). |
220004 | No Perforce job exists for this issue. | 2 | 33292 | 8 years, 7 weeks, 3 days ago | Improved logging in Quorum Code. |
Reviewed
|
0|i06287: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1321 | Add number of client connections metric in JMX and srvr |
Improvement | Resolved | Major | Fixed | Neha Narkhede | Neha Narkhede | Neha Narkhede | 07/Dec/11 11:19 | 10/Feb/12 19:16 | 10/Feb/12 19:16 | 3.3.4, 3.4.2 | 3.4.4, 3.5.0 | 0 | 4 | The related conversation on the zookeeper user mailing list is here - http://apache.markmail.org/message/4jjcmooniowwugu2?q=+list:org.apache.hadoop.zookeeper-user It is useful to be able to monitor the number of disconnect operations on a client. This is generally indicative of a client going through large number of GC and hence disconnecting way too often from a zookeeper cluster. Today, this information is only indirectly exposed as part of the stat command which requires counting the results. That's alot of work for the server to do just to get connection count. For monitoring purposes, it will be useful to have this exposed through JMX and 4lw srvr. |
patch | 219931 | No Perforce job exists for this issue. | 8 | 12496 | 8 years, 6 weeks, 6 days ago | 0|i02hvb: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1320 | Add the feature to zookeeper allow client limitations by ip. |
New Feature | Resolved | Major | Incomplete | Leader Ni | Leader Ni | Leader Ni | 06/Dec/11 05:43 | 18/Mar/12 02:28 | 18/Mar/12 02:28 | 3.3.3 | server | 0 | 0 | 604800 | 604800 | 0% | Linux version 2.6.18-164.el5 (gcc version 4.1.2 20080704 (Red Hat 4.1.2-46)), jdk-1.6.0_17 | Add the feature to zookeeper so that administrator can set the list of ips that limit which nodes can connect to the zk servers and which connected clients can operate on data. | 0% | 0% | 604800 | 604800 | client,server,limited,ipfilter | 219737 | No Perforce job exists for this issue. | 4 | 33293 | 8 years, 1 week, 5 days ago | Add the feature to zookeeper so that administrator can set the list of ips that limit which nodes can connect to the zk servers and which connected clients can operate on data. |
0|i0628f: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1319 | Missing data after restarting+expanding a cluster |
Bug | Closed | Blocker | Fixed | Patrick D. Hunt | Jeremy Stribling | Jeremy Stribling | 05/Dec/11 22:06 | 16/Dec/11 20:33 | 09/Dec/11 14:09 | 3.4.0 | 3.4.1, 3.5.0 | 0 | 4 | Linux (Debian Squeeze) | I've been trying to update to ZK 3.4.0 and have had some issues where some data become inaccessible after adding a node to a cluster. My use case is a bit strange (as explained before on this list) in that I try to grow the cluster dynamically by having an external program automatically restart Zookeeper servers in a controlled way whenever the list of participating ZK servers needs to change. This used to work just fine in 3.3.3 (and before), so this represents a regression. The scenario I see is this: 1) Start up a 1-server ZK cluster (the server has ZK ID 0). 2) A client connects to the server, and makes a bunch of znodes, in particular a znode called "/membership". 3) Shut down the cluster. 4) Bring up a 2-server ZK cluster, including the original server 0 with its existing data, and a new server with ZK ID 1. 5) Node 0 has the highest zxid and is elected leader. 6) A client connecting to server 1 tries to "get /membership" and gets back a -101 error code (no such znode). 7) The same client then tries to "create /membership" and gets back a -110 error code (znode already exists). 8) Clients connecting to server 0 can successfully "get /membership". I will attach a tarball with debug logs for both servers, annotating where steps #1 and #4 happen. You can see that the election involves a proposal for zxid 110 from server 0, but immediately following the election server 1 has these lines: 2011-12-05 17:18:48,308 9299 [QuorumPeer[myid=1]/127.0.0.1:2901] WARN org.apache.zookeeper.server.quorum.Learner - Got zxid 0x100000001 expected 0x1 2011-12-05 17:18:48,313 9304 [SyncThread:1] INFO org.apache.zookeeper.server.persistence.FileTxnLog - Creating new log file: log.100000001 Perhaps that's not relevant, but it struck me as odd. At the end of server 1's log you can see a repeated cycle of getData->create->getData as the client tries to make sense of the inconsistent responses. The other piece of information is that if I try to use the on-disk directories for either of the servers to start a new one-node ZK cluster, all the data are accessible. I haven't tried writing a program outside of my application to reproduce this, but I can do it very easily with some of my app's tests if anyone needs more information. |
cluster, data | 219708 | No Perforce job exists for this issue. | 5 | 32621 | 8 years, 15 weeks, 6 days ago |
Reviewed
|
0|i05y33: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1318 | In Python binding, get_children (and get and exists, and probably others) with expired session doesn't raise exception properly |
Bug | Resolved | Major | Fixed | Henry Robinson | Jim Fulton | Jim Fulton | 04/Dec/11 14:06 | 11/May/12 07:00 | 09/May/12 21:54 | 3.3.3 | 3.3.6, 3.4.4, 3.5.0 | contrib-bindings | 1 | 5 | Mac OS X (at least) | In Python binding, get_children (and get and exists, and probably others) with expired session doesn't raise exception properly. >>> zookeeper.state(h) -112 >>> zookeeper.get_children(h, '/') Traceback (most recent call last): File "<console>", line 1, in <module> SystemError: error return without exception set Let me know if you'd like me to work on a patch. |
219526 | No Perforce job exists for this issue. | 2 | 32622 | 7 years, 45 weeks, 6 days ago |
Reviewed
|
0|i05y3b: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1317 | Possible segfault in zookeeper_init |
Bug | Closed | Minor | Fixed | Akira Kitada | Akira Kitada | Akira Kitada | 04/Dec/11 08:48 | 16/Dec/11 20:33 | 09/Dec/11 13:37 | 3.3.3, 3.4.0 | 3.4.1, 3.5.0 | c client | 0 | 1 | zookeeper_init does not check the return value of strdup(index_chroot). When it returns NULL, it causes segfault when it try to strlen(zh->chroot). |
219508 | No Perforce job exists for this issue. | 1 | 32623 | 8 years, 15 weeks, 6 days ago |
Reviewed
|
0|i05y3j: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1316 | zookeeper_init leaks memory if chroot is just '/' |
Bug | Closed | Minor | Fixed | Akira Kitada | Akira Kitada | Akira Kitada | 04/Dec/11 08:34 | 16/Dec/11 20:33 | 08/Dec/11 17:30 | 3.3.3, 3.4.0 | 3.4.1, 3.5.0 | c client | 0 | 1 | zookeeper_init does not free strdup'ed memory when chroot is just '/'. |
219507 | No Perforce job exists for this issue. | 1 | 32624 | 8 years, 15 weeks, 6 days ago |
Reviewed
|
0|i05y3r: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1315 | zookeeper_init always reports sessionPasswd=<hidden> |
Bug | Closed | Minor | Fixed | Akira Kitada | Akira Kitada | Akira Kitada | 04/Dec/11 05:31 | 16/Dec/11 20:33 | 08/Dec/11 19:21 | 3.3.4, 3.4.0 | 3.4.1, 3.5.0 | c client | 0 | 1 | zookeeper_init always reports sessionPasswd=<hidden> even when it's empty. | 219502 | No Perforce job exists for this issue. | 1 | 32625 | 8 years, 15 weeks, 6 days ago |
Reviewed
|
0|i05y3z: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1314 | improve zkpython synchronous api implementation |
Improvement | Open | Minor | Unresolved | Daniel Lescohier | Daniel Lescohier | Daniel Lescohier | 01/Dec/11 16:53 | 26/Jul/13 14:21 | 3.3.3 | contrib-bindings | 1 | 5 | 1800 | 1800 | 0% | Improves the following items in zkpython which are related to the Zookeeper synchronous API: # For pyzoo_create, no longer limit the returned znode name to 256 bytes; dynamically allocate memory on the heap. # For all the synchronous api calls, release the Python Global Interpreter Lock just before doing the synchronous call. I will attach the patch shortly. |
0% | 0% | 1800 | 1800 | 219245 | No Perforce job exists for this issue. | 2 | 41993 | 6 years, 34 weeks, 6 days ago | Improves zkpython synchronous api; release GIL before synchronous calls, and do not limit returned znode name to 256 bytes for synchronous create call. |
0|i07jwv: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1313 | Expose/create KeeperException for "Packet len <x> is out of range!" error when jute max buffer size is exceeded |
Bug | Open | Major | Unresolved | Unassigned | Daniel Lord | Daniel Lord | 30/Nov/11 19:26 | 15/Sep/14 23:06 | 0 | 3 | When a zookeeper client receives a Packet that is over the jute max buffer limit the behavior that is exposed to the callers of the zookeeper client is misleading. When the packet length exceeds the max size an IOException is thrown. This is caught and handled by the SendThread by cleaning up the current connection and enqueueing a Disconnected event. The immediate caller of zookeeper sees this as a ConnectionLossException with a Disconnected event on the main Watcher. This state transition is a bit misleading because under many circumstances as soon as the SyncConnected event is received retrying the same operation will succeed. However, in this case it is likely that the zookeeper client will reconnect immediately and if the operation is retried the same jute max buffer limit exception will be thrown which will trigger another disconnect and reconnect. It would be great if the exception was exposed to the caller of the zookeeper client some how so that a more appropriate action can be taken. For instance, it might be appropriate to fail completely or to attempt to establish a new session. |
219111 | No Perforce job exists for this issue. | 0 | 32626 | 5 years, 27 weeks, 2 days ago | 0|i05y47: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1312 | Add a "getChildrenWithStat" operation |
New Feature | Open | Major | Unresolved | Unassigned | Daniel Lord | Daniel Lord | 30/Nov/11 18:46 | 02/Dec/11 19:54 | 0 | 1 | It would be extremely useful to be able to have a "getChildrenWithStat" method. This method would behave exactly the same as getChildren but in addition to returning the list of all child znode names it would also return a Stat for each child. I'm sure there are quite a few use cases for this but it could save a lot of extra reads for my application. | newbie | 219102 | No Perforce job exists for this issue. | 0 | 41994 | 8 years, 16 weeks, 5 days ago | 0|i07jx3: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1311 | ZooKeeper test jar is broken |
Bug | Closed | Blocker | Fixed | Ivan Kelly | Ivan Kelly | Ivan Kelly | 30/Nov/11 13:04 | 16/Dec/11 20:33 | 01/Dec/11 02:14 | 3.4.0 | 3.4.1, 3.5.0 | 0 | 1 | In http://repo1.maven.org/maven2/org/apache/zookeeper/zookeeper/3.4.0/ the test jar cannot be accessed by maven. There are two possible solutions to this. a) rename zookeeper-3.4.0-test.jar to zookeeper-3.4.0-tests.jar and remove zookeeper-3.4.0-test.pom* With this, the maven can access the test jar with {code} <dependency> <groupId>org.apache.zookeeper</groupId> <artifactId>zookeeper</artifactId> <version>3.4.0</version> <type>test-jar</type> <scope>test</scope> </dependency> {code} b) Alternatively, zookeeper test could be it's own submodule. To do this, it must be deployed in the following layout {code} ./org/apache/zookeeper/zookeeper-test/3.4.0-BK-SNAPSHOT/zookeeper-test-3.4.0.jar ./org/apache/zookeeper/zookeeper-test/3.4.0-BK-SNAPSHOT/zookeeper-test-3.4.0.jar.md5 ./org/apache/zookeeper/zookeeper-test/3.4.0-BK-SNAPSHOT/zookeeper-test-3.4.0.jar.sha1 ./org/apache/zookeeper/zookeeper-test/3.4.0-BK-SNAPSHOT/zookeeper-test-3.4.0.pom ./org/apache/zookeeper/zookeeper-test/3.4.0-BK-SNAPSHOT/zookeeper-test-3.4.0.pom.md5 ./org/apache/zookeeper/zookeeper-test/3.4.0-BK-SNAPSHOT/zookeeper-test-3.4.0.pom.sha1 {code} This can then be accessed by maven with {code} <dependency> <groupId>org.apache.zookeeper</groupId> <artifactId>zookeeper-test</artifactId> <version>3.4.0</version> <scope>test</scope> </dependency> {code} I think a) is the better solution. |
219050 | No Perforce job exists for this issue. | 1 | 32627 | 8 years, 17 weeks ago |
Reviewed
|
0|i05y4f: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1310 | C Api should use state CONNECTION_LOSS |
New Feature | Open | Major | Unresolved | Unassigned | Jakub Lekstan | Jakub Lekstan | 30/Nov/11 07:47 | 06/May/12 00:51 | c client | 0 | 1 | Linux | I would like to ZooKeeper let know my watcher (which I'm giving to zookeeeper_init) about CONNECTION_LOSS, right the given watcher doesn't know that connection is lost due to what I can't do my stuff. What you think? If so I could try to create a patch. |
219010 | No Perforce job exists for this issue. | 0 | 41995 | 7 years, 46 weeks, 4 days ago | 0|i07jxb: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1309 | Creating a new ZooKeeper client can leak file handles |
Bug | Resolved | Critical | Fixed | Daniel Lord | Daniel Lord | Daniel Lord | 28/Nov/11 21:03 | 26/Feb/12 19:32 | 26/Feb/12 19:32 | 3.3.4 | 3.3.5 | java client | 3 | 1 | If there is an IOException thrown by the constructor of ClientCnxn then file handles are leaked because of the initialization of the Selector which is never closed. final Selector selector = Selector.open(); If there is an abnormal exit from the constructor then the Selector is not closed and file handles are leaked. You can easily see this by setting the hosts string to garbage ("qwerty", "asdf", etc.) and then try to open a new ZooKeeper connection. I've observed the same behavior in production when there were DNS issues where the host names of the ensemble can no longer be resolved and the application servers quickly run out of handles attempting to (re)connect to zookeeper. |
218779 | No Perforce job exists for this issue. | 4 | 32628 | 8 years, 4 weeks, 4 days ago | 0|i05y4n: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1308 | Guaranteed NPE in WriteLock recipe |
Bug | Resolved | Minor | Invalid | Unassigned | Mark Miller | Mark Miller | 24/Nov/11 08:43 | 24/Nov/11 08:48 | 24/Nov/11 08:48 | recipes | 0 | 0 | {code} public boolean execute() throws KeeperException, InterruptedException { do { if (id == null) { long sessionId = zookeeper.getSessionId(); String prefix = "x-" + sessionId + "-"; // lets try look up the current ID if we failed // in the middle of creating the znode findPrefixInChildren(prefix, zookeeper, dir); idName = new ZNodeName(id); } {code} ZNodeName will throw an NPE if id is null. |
218348 | No Perforce job exists for this issue. | 0 | 32629 | 8 years, 18 weeks ago | 0|i05y4v: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1307 | zkCli.sh is exiting when an Invalid ACL exception is thrown from setACL command through client |
Bug | Resolved | Minor | Fixed | kavita sharma | amith | amith | 23/Nov/11 04:07 | 23/Apr/12 13:17 | 16/Mar/12 20:42 | 3.4.4, 3.5.0 | java client | 0 | 3 | ZOOKEEPER-1391, ZOOKEEPER-271 | zkCli.sh | use consoleClient (zkCli.sh) and issue setAcl /temp abc [zk: XX.XX.XX.XX:XXXX(CONNECTED) 17] setAcl /temp abc abc does not have the form scheme:id:perm Exception in thread "main" org.apache.zookeeper.KeeperException$InvalidACLException: KeeperErrorCode = InvalidACL at org.apache.zookeeper.ZooKeeper.setACL(ZooKeeper.java:1172) at org.apache.zookeeper.ZooKeeperMain.processZKCmd(ZooKeeperMain.java:717) at org.apache.zookeeper.ZooKeeperMain.processCmd(ZooKeeperMain.java:582) at org.apache.zookeeper.ZooKeeperMain.executeLine(ZooKeeperMain.java:354) at org.apache.zookeeper.ZooKeeperMain.run(ZooKeeperMain.java:312) at org.apache.zookeeper.ZooKeeperMain.main(ZooKeeperMain.java:271) linux-xxx:/zookeeper1/bin # if any InvalidACLException is thrown then client is exiting. client should be able to handle this kind of issues |
newbie | 218174 | No Perforce job exists for this issue. | 1 | 32630 | 8 years, 1 week, 5 days ago |
Reviewed
|
0|i05y53: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1306 | hang in zookeeper_close() |
Bug | Open | Major | Unresolved | Michael Lee | helei | helei | 19/Nov/11 02:02 | 29/Nov/11 16:52 | 3.3.3 | c client | 1 | 0 | With patch ZOOKEEPER-981, I saw another problem. Hang in zookeeper_close() again. here is the stack: (gdb) bt #0 0x000000302b80adfb in __lll_mutex_lock_wait () from /lib64/tls/libpthread.so.0 #1 0x000000302b1307a8 in main_arena () from /lib64/tls/libc.so.6 #2 0x000000302b910230 in stack_used () from /lib64/tls/libpthread.so.0 #3 0x000000302b808dde in pthread_cond_broadcast@@GLIBC_2.3.2 () from /lib64/tls/libpthread.so.0 #4 0x00000000006b4ce8 in adaptor_finish (zh=0x6902060) at src/mt_adaptor.c:217 #5 0x00000000006b0fd0 in zookeeper_close (zh=0x6902060) at src/zookeeper.c:2297 (gdb) p zh->ref_counter $5 = 1 (gdb) p zh->close_requested $6 = 1 (gdb) p *zh $7 = {fd = 110112576, hostname = 0x6903620 "", addrs = 0x0, addrs_count = 1, watcher = 0x62e5dc <doris::meta_register_mgr_t::register_mgr_watcher(_zhandle*, int, int, char const*, void*)>, last_recv = {tv_sec = 1321510694, tv_usec = 552835}, last_send = {tv_sec = 1321510694, tv_usec = 552886}, last_ping = {tv_sec = 1321510685, tv_usec = 774869}, next_deadline = { tv_sec = 1321510704, tv_usec = 547831}, recv_timeout = 30000, input_buffer = 0x0, to_process = {head = 0x0, last = 0x0, lock = {__m_reserved = 0, __m_count = 0, __m_owner = 0x0, __m_kind = 0, __m_lock = {__status = 0, __spinlock = 0}}}, to_send = {head = 0x0, last = 0x0, lock = { __m_reserved = 0, __m_count = 0, __m_owner = 0x0, __m_kind = 1, __m_lock = {__status = 0, __spinlock = 0}}}, sent_requests = {head = 0x0, last = 0x0, cond = {__c_lock = {_status = 1, __spinlock = -1}, __c_waiting = 0x0, __padding = '\0' <repeats 15 times>, __align = 0}, lock = {_m_reserved = 0, __m_count = 0, __m_owner = 0x0, __m_kind = 0, __m_lock = {__status = 0, __spinlock = 0}}}, completions_to_process = {head = 0x2aefbff800, last = 0x2af0e05f40, cond = {__c_lock = {__status = 592705486850, __spinlock = -1}, __c_waiting = 0x45, _padding = "E\000\000\000\000\000\000\000\220\006\000\000\000", __align = 296352743424}, lock = {_m_reserved = 1, __m_count = 0, __m_owner = 0x1000026ca, __m_kind = 0, __m_lock = {__status = 0, __spinlock = 0}}}, connect_index = 0, client_id = {client_id = 86551148676999146, passwd = "G懵擀\233\213\f闬202筴\002錪\034"}, last_zxid = 82057372, outstanding_sync = 0, primer_buffer = {buffer = 0x6902290 "", len = 40, curr_offset = 44, next = 0x0}, primer_storage = {len = 36, protocolVersion = 0, timeOut = 30000, sessionId = 86551148676999146, passwd_len = 16, passwd = "G懵擀\233\213\f闬202筴\002錪\034"}, primer_storage_buffer = "\000\000\000$\000\000\000\000\000\000u0\0013}惜薵闬000\000\000\020G懵擀\233\213\f闬202筴\002錪\034", state = 0, context = 0x0, auth_h = {auth = 0x0, lock = {__m_reserved = 0, __m_count = 0, __m_owner = 0x0, __m_kind = 0, __m_lock = {__status = 0, __spinlock = 0}}}, ref_counter = 1, close_requested = 1, adaptor_priv = 0x0, socket_readable = {tv_sec = 0, tv_usec = 0}, active_node_watchers = 0x6901520, active_exist_watchers = 0x69015d0, active_child_watchers = 0x6902ef0, chroot = 0x0} I think the ref_counter is suposed to be 2 or 3 or 4 here. it seems not correct. I think maybe we should increase the ref_counter before we set zh->close_request=1, otherwise the do_io thread and do_completion thread may release the handler just after we set zh->close_request and before we increase zh->ref_counter. Thanks again |
217774 | No Perforce job exists for this issue. | 1 | 32631 | 8 years, 17 weeks, 2 days ago | must exclude patch in ZOOKEEPER-981 | 0|i05y5b: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1305 | zookeeper.c:prepend_string func can dereference null ptr |
Bug | Closed | Major | Fixed | Daniel Lescohier | Daniel Lescohier | Daniel Lescohier | 18/Nov/11 13:23 | 02/May/12 22:06 | 08/Dec/11 17:03 | 3.3.3 | 3.4.1, 3.3.6, 3.5.0 | c client | 0 | 3 | 1800 | 1800 | 0% | ZOOKEEPER-1461 | All | All the callers of the function prepend_string make a call to prepend_string before checking that zhandle_t *zh is not null. At the top of prepend_string, zh is dereferenced without checking for a null ptr: static char* prepend_string(zhandle_t *zh, const char* client_path) { char *ret_str; if (zh->chroot == NULL) return (char *) client_path; I propose fixing this by adding the check here in prepend_string: static char* prepend_string(zhandle_t *zh, const char* client_path) { char *ret_str; if (zh==NULL || zh->chroot == NULL) return (char *) client_path; |
0% | 0% | 1800 | 1800 | patch | 217712 | No Perforce job exists for this issue. | 2 | 32632 | 7 years, 47 weeks ago | return ZBADARGUMENTS when passed NULL zhandle instead of dereferencing null pointer. | 0|i05y5j: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1304 | [IGNORE THIS --- MOVING TO BOOKKEEPER JIRA] publish and subscribe methods get ServiceDownException even when the hubs, bookies, and zookeepers are running |
Bug | Resolved | Major | Duplicate | Unassigned | Daniel Kim | Daniel Kim | 17/Nov/11 18:37 | 09/Oct/13 20:09 | 09/Oct/13 20:09 | 3.5.0 | 0 | 0 | 1209600 | 1209600 | 0% | CentOS 5.5 for all servers and workstations (however zookeeper, bookies, and hubs are all built in Ubuntu 11); OpenJDK Runtime Environment (IcedTea6 1.9.10) (rhel-1.23.1.9.10.el5_7-i386); OpenJDK Client VM (build 19.0-b09, mixed mode); |
**[Sorry. I don't know how to delete an issue that is already submitted. I just learned of the Bookkeeper jira, and I will submit this issue there instead. You can all ignore this issue.] Since I couldn't finish building all hedwig components in CentOS, I built it successfully in Ubuntu, then I deployed it to CentOS (no ubuntu image in my company's cloud). I configured zookeeper, bookies and hubs as they were described in the documentations. First, I copied TestPubSubClient.java's publish and subscribe tests into my own test code. I also had to create another object that extends ClientConfiguration. I named it "HedwigConf", and overwrote getDefaultServerHedwigSocketAddress() method because the server was not on the same machine as the workstation. I targetted the right host and publish seemed to work. However, it throws me ServiceDownException for publish sometimes. I checked the logs of the hubs. They seem to have connected ok with the bookies. There was no error or warning there. However, the problem seemed to exist in bookies and zookeeper. This was found in the zookeeper log: "Got user-level KeeperException when processing sessionid:0x----------- type:create cxid:0x5 zxid:0x29 txntype:-1 reqpath:n/a Error Path:/hedwig/standalone/topics Error:KeeperErrorCode = NoNode for /hedwig/standalone/topics". Normally this znode path is created automatically. Also, some bookies complained this: "WARN [NIOServerFactory] org.apache.bookkeeper.proto.NIOServerFactory - Exception in server socket loop: /0:0:0:0:0:0:0:0 java.lang.NullPointerException". For some reason, this problem comes and goes. Sometimes everything just works and the new topic is saved in a new znode, and the message is saved in bookie(s). I spent hours trying to recreate this yesterday, but I couldn't. Now it is back again. Subscribe seems to have the similar issue. |
0% | 0% | 1209600 | 1209600 | 217600 | No Perforce job exists for this issue. | 0 | 32633 | 8 years, 19 weeks ago | hedwig-client, hedwig, bookies | 0|i05y5r: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1303 | Observer LearnerHandlers are not removed from Leader collection. |
Bug | Resolved | Minor | Duplicate | Ashish Mishra | Ashish Mishra | Ashish Mishra | 17/Nov/11 18:32 | 30/Apr/14 16:25 | 30/Apr/14 16:25 | 3.3.4 | 3.4.4, 3.5.0 | scripts | 1 | 4 | 604800 | 604800 | 0% | The Leader.removeLearnerHandler() call removes handlers from the forwardingFollowers and learners sets, but not from observingLearners. This will cause a leak if observers are repeatedly connected and disconnected from the ensemble. | 0% | 0% | 604800 | 604800 | 217599 | No Perforce job exists for this issue. | 1 | 32634 | 5 years, 47 weeks, 1 day ago | 0|i05y5z: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1302 | patch to create rpm/deb on 3.3 branch |
Improvement | Resolved | Major | Won't Fix | Giridharan Kesavan | Giridharan Kesavan | Giridharan Kesavan | 16/Nov/11 16:41 | 17/Jan/12 16:28 | 17/Nov/11 00:08 | 3.3.3 | build | 0 | 0 | backport zookeeper-999 patch to 3.3 branch and add zookeeper-setup-conf.sh to enable zk quorum setup | 217434 | No Perforce job exists for this issue. | 3 | 33294 | 8 years, 10 weeks, 2 days ago | 0|i0628n: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1301 | backport patches related to the zk startup script from 3.4 to 3.3 release |
Improvement | Closed | Major | Fixed | Giridharan Kesavan | Giridharan Kesavan | Giridharan Kesavan | 16/Nov/11 15:00 | 29/Nov/11 12:54 | 17/Nov/11 00:22 | 3.3.4 | 3.3.4 | 0 | 0 | 217413 | No Perforce job exists for this issue. | 3 | 33295 | 8 years, 19 weeks ago |
Reviewed
|
0|i0628v: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1300 | Rat complains about incosistent licenses in the src files. |
Bug | Resolved | Major | Duplicate | Mahadev Konar | Mahadev Konar | Mahadev Konar | 16/Nov/11 14:43 | 21/Jul/14 16:50 | 21/Jul/14 16:50 | 3.4.0 | 3.5.0 | 0 | 0 | From phunt: {noformat} Note: I even tried upgrading to RAT 0.8 and this is the output: (same/similar) [rat:report] 15 Unknown Licenses [rat:report] [rat:report] ******************************* [rat:report] [rat:report] Unapproved licenses: [rat:report] [rat:report] /home/phunt/Downloads/zookeeper-3.4.0/build/zookeeper-3.4.0/README_packaging.txt [rat:report] /home/phunt/Downloads/zookeeper-3.4.0/build/zookeeper-3.4.0/contrib/ZooInspector/licences/epl-v10.html [rat:report] /home/phunt/Downloads/zookeeper-3.4.0/build/zookeeper-3.4.0/src/c/include/winstdint.h [rat:report] /home/phunt/Downloads/zookeeper-3.4.0/build/zookeeper-3.4.0/src/contrib/loggraph/web/org/apache/zookeeper/graph/log4j.properties [rat:report] /home/phunt/Downloads/zookeeper-3.4.0/build/zookeeper-3.4.0/src/contrib/loggraph/web/org/apache/zookeeper/graph/resources/date.format.js [rat:report] /home/phunt/Downloads/zookeeper-3.4.0/build/zookeeper-3.4.0/src/contrib/loggraph/web/org/apache/zookeeper/graph/resources/g.bar.js [rat:report] /home/phunt/Downloads/zookeeper-3.4.0/build/zookeeper-3.4.0/src/contrib/loggraph/web/org/apache/zookeeper/graph/resources/g.dot.js [rat:report] /home/phunt/Downloads/zookeeper-3.4.0/build/zookeeper-3.4.0/src/contrib/loggraph/web/org/apache/zookeeper/graph/resources/g.line.js [rat:report] /home/phunt/Downloads/zookeeper-3.4.0/build/zookeeper-3.4.0/src/contrib/loggraph/web/org/apache/zookeeper/graph/resources/g.pie.js [rat:report] /home/phunt/Downloads/zookeeper-3.4.0/build/zookeeper-3.4.0/src/contrib/loggraph/web/org/apache/zookeeper/graph/resources/g.raphael.js [rat:report] /home/phunt/Downloads/zookeeper-3.4.0/build/zookeeper-3.4.0/src/contrib/loggraph/web/org/apache/zookeeper/graph/resources/raphael.js [rat:report] /home/phunt/Downloads/zookeeper-3.4.0/build/zookeeper-3.4.0/src/contrib/loggraph/web/org/apache/zookeeper/graph/resources/yui-min.js [rat:report] /home/phunt/Downloads/zookeeper-3.4.0/build/zookeeper-3.4.0/src/contrib/monitoring/JMX-RESOURCES [rat:report] /home/phunt/Downloads/zookeeper-3.4.0/build/zookeeper-3.4.0/src/contrib/zooinspector/lib/log4j.properties [rat:report] /home/phunt/Downloads/zookeeper-3.4.0/build/zookeeper-3.4.0/src/contrib/zooinspector/licences/epl-v10.html {noformat} |
217410 | No Perforce job exists for this issue. | 0 | 32635 | 5 years, 35 weeks, 3 days ago | 0|i05y67: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1299 | Add winconfig.h file to ignore in release audit. |
Bug | Closed | Major | Fixed | Mahadev Konar | Mahadev Konar | Mahadev Konar | 16/Nov/11 01:53 | 23/Nov/11 14:22 | 16/Nov/11 02:19 | 3.4.0 | 3.4.0 | 0 | 1 | We need to add the winconfig.h to ignores in release audits. | 217315 | No Perforce job exists for this issue. | 0 | 32636 | 8 years, 19 weeks, 1 day ago | 0|i05y6f: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1298 | config,h gets emptied by make, at least on mac os x 10.6.8 |
Bug | Open | Major | Unresolved | Unassigned | Jim Fulton | Jim Fulton | 12/Nov/11 12:29 | 12/Nov/11 12:29 | 3.3.3 | c client | 0 | 0 | Mac OS X 10.6.8 | configure creates a working config.h. On Snow leopard, after running configure: ls -l config.h -rw-r--r-- 1 jim jim 4437 Nov 12 12:16 config.h which looks reasomnable. Running make replaces config.h: make (CDPATH="${ZSH_VERSION+.}:" && cd . && /bin/sh /Users/jim/s/zookeeper-3.3.3/src/c/missing --run autoheader) /opt/local/bin/gm4: cannot open `configure.in': No such file or directory rm -f stamp-h1 touch config.h.in cd . && /bin/sh ./config.status config.h config.status: creating config.h make all-am /bin/sh ./libtool --tag=CC --mode=compile gcc -DHAVE_CONFIG_H -I. -I./include -I./tests -I./generated -Wall -Werror -g -O2 -D_GNU_SOURCE -MT zookeeper.lo -MD -MP -MF .deps/zookeeper.Tpo -c -o zookeeper.lo `test -f 'src/zookeeper.c' || echo './'`src/zookeeper.c libtool: compile: gcc -DHAVE_CONFIG_H -I. -I./include -I./tests -I./generated -Wall -Werror -g -O2 -D_GNU_SOURCE -MT zookeeper.lo -MD -MP -MF .deps/zookeeper.Tpo -c src/zookeeper.c -fno-common -DPIC -o .libs/zookeeper.o src/zookeeper.c: In function 'log_env': src/zookeeper.c:658: error: 'PACKAGE_STRING' undeclared (first use in this function) src/zookeeper.c:658: error: (Each undeclared identifier is reported only once src/zookeeper.c:658: error: for each function it appears in.) cc1: warnings being treated as errors src/zookeeper.c:647: warning: unused variable 'buf' make[1]: *** [zookeeper.lo] Error 1 make: *** [all] Error 2 ls -l config.h -rw-r--r-- 1 jim jim 137 Nov 12 12:17 config.h config.h is empty, except for a comment. If I make a copy of config.h after configure and restore it after running the failed make, then I can run make again and the make succeeds. On a centos 5 vm, I can build just fine, but I suspect that has something to do with it not being happy with autoconf: (CDPATH="${ZSH_VERSION+.}:" && cd . && /bin/sh /home/jim/s/zookeeper-3.3.3/src/c/missing --run autoheader) aclocal.m4:20: warning: this file was generated for autoconf 2.67. You have another version of autoconf. It may work, but is not guaranteed to. If you have problems, you may need to regenerate the build system entirely. To do so, use the procedure documented by the package, typically `autoreconf'. configure.ac:21: error: Autoconf version 2.62 or higher is required aclocal.m4:8577: AM_INIT_AUTOMAKE is expanded from... configure.ac:21: the top level autom4te: /usr/bin/m4 failed with exit status: 63 autoheader: /usr/bin/autom4te failed with exit status: 63 WARNING: `autoheader' is probably too old. You should only need it if you modified `acconfig.h' or `configure.ac'. You might want to install the `Autoconf' and `GNU m4' packages. Grab them from any GNU archive site. rm -f stamp-h1 touch config.h.in cd . && /bin/sh ./config.status config.h config.status: creating config.h config.status: config.h is unchanged ... I'm pretty clueless wrt autoconf. I can work around this by touching config.h.in before running configure. That seems to lead to a clean make, presumably by bypassing the autoconf step. I don't know if that matters. :) My goal is to automate building on at least unix-like systems as part of building a self-contained source distribution of the Python extension that builds by just running it's setup script. |
216971 | No Perforce job exists for this issue. | 0 | 32637 | 8 years, 19 weeks, 5 days ago | 0|i05y6n: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1297 | Add stat information to create() call |
New Feature | Resolved | Major | Fixed | Lenni Kuff | Gunnar Wagenknecht | Gunnar Wagenknecht | 11/Nov/11 07:51 | 23/Dec/13 19:10 | 19/Dec/12 13:17 | 3.3.3 | 3.5.0 | java client | 0 | 3 | ZOOKEEPER-1851 | In order to get a Stat object after creation one has to do another exists() call. This leaves client code vulnerable to a possible update window by another writer. All synchronous methods but the create() method allow to pass in a Stat object for population. It would be nice if the create() method would also allow this. | newbie | 216867 | No Perforce job exists for this issue. | 3 | 2407 | 7 years, 14 weeks ago |
Reviewed
|
0|i00rmf: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1296 | Add zookeeper-setup-conf.sh script |
Improvement | Open | Minor | Unresolved | Eric Yang | Eric Yang | Eric Yang | 09/Nov/11 17:40 | 05/Feb/20 07:17 | 3.4.0 | 3.7.0, 3.5.8 | scripts | 0 | 2 | Shell script | It would be nice to provide a setup script for zoo.cfg and zookeeper-env.sh. The proposed script will provide the following options: {noformat} usage: /usr/sbin/zookeeper-setup-conf.sh <parameters> Required parameters: --conf-dir Set ZooKeeper configuration directory --log-dir Set ZooKeeper log directory Optional parameters: --auto-purge-interval=1 Set snapshot auto purge interval --client-port=2181 Set client port --data-dir=/var/lib/zookeeper Set data directory --hosts=host1,host2 Set ZooKeeper qourum hostnames --init-limit=10 Set initial sync limit --java-home Set JAVA_HOME location --snapshot-count=3 Set snapshot retain count --sync-limit=5 Set sync limit --tick-time=2000 Set milliseconds of each tick {noformat} |
216672 | No Perforce job exists for this issue. | 4 | 41996 | 5 years, 51 weeks, 3 days ago | 0|i07jxj: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1295 | Documentation for jute.maxbuffer is not correct in ZooKeeper Administrator's Guide |
Bug | Resolved | Major | Fixed | Mohammad Arshad | Daniel Lord | Daniel Lord | 09/Nov/11 14:23 | 27/Oct/19 05:28 | 28/May/16 13:00 | 3.5.2 | documentation | 0 | 6 | ZOOKEEPER-3593, ZOOKEEPER-2402 | The jute maxbuffer size is documented as being defaulted to 1 megabyte in the administrators guide. I believe that this is true server side but it is not true client side. On the client side the default is (at least in 3.3.2) this: packetLen = Integer.getInteger("jute.maxbuffer", 4096 * 1024); On the server side the documentation looks to be correct: private static int determineMaxBuffer() { String maxBufferString = System.getProperty("jute.maxbuffer"); try { return Integer.parseInt(maxBufferString); } catch(Exception e) { return 0xfffff; } } The documentation states this: jute.maxbuffer: (Java system property: jute.maxbuffer) This option can only be set as a Java system property. There is no zookeeper prefix on it. It specifies the maximum size of the data that can be stored in a znode. The default is 0xfffff, or just under 1M. If this option is changed, the system property must be set on all servers and clients otherwise problems will arise. This is really a sanity check. ZooKeeper is designed to store data on the order of kilobytes in size. |
newbie | 216652 | No Perforce job exists for this issue. | 0 | 32638 | 3 years, 42 weeks, 5 days ago | 0|i05y6v: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1294 | One of the zookeeper server is not accepting any requests |
Bug | Resolved | Major | Fixed | kavita sharma | amith | amith | 09/Nov/11 04:54 | 24/May/13 20:17 | 13/Jan/12 19:08 | 3.5.0 | server | 0 | 7 | 3 Zookeeper + 3 Observer with SuSe-11 | In zoo.cfg i have configured as server.1 = XX.XX.XX.XX:65175:65173 server.2 = XX.XX.XX.XX:65185:65183 server.3 = XX.XX.XX.XX:65195:65193 server.4 = XX.XX.XX.XX:65205:65203:observer server.5 = XX.XX.XX.XX:65215:65213:observer server.6 = XX.XX.XX.XX:65225:65223:observer Like above I have configured 3 PARTICIPANTS and 3 OBSERVERS in the cluster of 6 zookeepers Steps to reproduce the defect 1. Start all the 3 participant zookeeper 2. Stop all the participant zookeeper 3. Start zookeeper 1(Participant) 4. Start zookeeper 2(Participant) 5. Start zookeeper 4(Observer) 6. Create a persistent node with external client and close it 7. Stop the zookeeper 1(Participant neo quorum is unstable) 8. Create a new client and try to find the node created b4 using exists api (will fail since quorum not statisfied) 9. Start the Zookeeper 1 (Participant stabilise the quorum) Now check the observer using 4 letter word (Server.4) linux-216:/home/amith/CI/source/install/zookeeper/zookeeper2/bin # echo stat | netcat localhost 65200 Zookeeper version: 3.3.2-1031432, built on 11/05/2010 05:32 GMT Clients: /127.0.0.1:46370[0](queued=0,recved=1,sent=0) Latency min/avg/max: 0/0/0 Received: 1 Sent: 0 Outstanding: 0 Zxid: 0x100000003 Mode: observer Node count: 5 check the participant 2 with 4 letter word Latency min/avg/max: 22/48/83 Received: 39 Sent: 3 Outstanding: 35 Zxid: 0x100000003 Mode: leader Node count: 5 linux-216:/home/amith/CI/source/install/zookeeper/zookeeper2/bin # check the participant 1 with 4 letter word linux-216:/home/amith/CI/source/install/zookeeper/zookeeper2/bin # echo stat | netcat localhost 65170 This ZooKeeper instance is not currently serving requests We can see the participant1 logs filled with 2011-11-08 15:49:51,360 - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:65170:NIOServerCnxn@642] - Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running Problem here is participent1 is not responding / accepting any requests |
216580 | No Perforce job exists for this issue. | 4 | 32639 | 6 years, 43 weeks, 6 days ago |
Incompatible change, Reviewed
|
0|i05y73: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1293 | Remove unused readyToStart from Leader.java |
Improvement | Resolved | Trivial | Fixed | Alexander Shraer | Alexander Shraer | Alexander Shraer | 09/Nov/11 00:31 | 06/Jan/12 20:24 | 05/Jan/12 20:33 | 3.5.0 | server, tests | 0 | 0 | After ZOOKEEPER-1194 readyToStart is no longer used. | 216559 | No Perforce job exists for this issue. | 3 | 33296 | 8 years, 11 weeks, 5 days ago |
Reviewed
|
0|i06293: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1292 | FLETest is flaky |
Improvement | Resolved | Major | Fixed | Flavio Paiva Junqueira | Flavio Paiva Junqueira | Flavio Paiva Junqueira | 07/Nov/11 16:46 | 24/Dec/11 05:57 | 23/Dec/11 14:47 | 3.5.0 | leaderElection | 0 | 0 | testLE in FLETest is convoluted, difficult to read, and doesn't test FLE appropriately. The goal of this jira is to clean it up and propose a more reasonable test case. | 216374 | No Perforce job exists for this issue. | 3 | 33297 | 8 years, 13 weeks, 5 days ago | 0|i0629b: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1291 | ZOOKEEPER-1264 AcceptedEpoch not updated at leader before it proposes the epoch to followers |
Sub-task | Closed | Major | Fixed | Alexander Shraer | Alexander Shraer | Alexander Shraer | 05/Nov/11 16:01 | 23/Nov/11 14:22 | 05/Nov/11 16:59 | 3.4.0 | 3.4.0, 3.5.0 | server | 0 | 1 | It is possible that a leader proposes an epoch e and a follower adopts it by setting acceptedEpoch to e but the leader itself hasn't yet done so. While I'm not sure this contradicts Zab (there is no description of where the leader actually sets its acceptedEpoch), it is very counter intuitive. The fix is to set acceptedEpoch in getEpochToPropose, i.e., before anyone LearnerHandler passes the getEpochToPropose barrier. The fix is done as part of ZK-1264 |
216187 | No Perforce job exists for this issue. | 0 | 33298 | 8 years, 20 weeks, 4 days ago | Revision 1198053 | 0|i0629j: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1290 | zookeeper_init_with_watches |
New Feature | Open | Major | Unresolved | Unassigned | Marc Celani | Marc Celani | 04/Nov/11 22:04 | 05/Nov/11 02:06 | c client | 0 | 0 | Our use of zookeeper requires high scalability, and the underlying data set is small and changes infrequently. A persisted cache is ideal for solving scalability. We want to treat a restart as if it were a prolonged reconnect - that is, maintain the last known zxid and watch list. We would like to expose a new zookeeper_init_with_watches api that allows the zhandle to be initialized with the watch list and last known zxid. The change would reuse the current reconnect logic. | 216145 | No Perforce job exists for this issue. | 0 | 41997 | 8 years, 20 weeks, 5 days ago | 0|i07jxr: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1289 | Multi Op Watch Events |
New Feature | Open | Major | Unresolved | Unassigned | Marc Celani | Marc Celani | 04/Nov/11 21:48 | 30/Nov/11 18:08 | c client, java client, server | 0 | 0 | Caches built on top of zookeeper clients can become inconsistent because of lack of multi op watches. Our clients receive watch notifications for paths one at a time, and in their watch handling, invalidate the path in the cache. However, the cache now has an inconsistent view of zookeeper, since it is receiving the notifications one at a time. In general, the watch handling semantics do not conform with the idea of a multi op. If changes can be made to multiple paths atomically, all clients should be notified of that change atomically. | 216143 | No Perforce job exists for this issue. | 0 | 41998 | 8 years, 17 weeks, 1 day ago | 0|i07jxz: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1288 | ZOOKEEPER-1198 Always log sessionId and zxid as hexadecimals |
Sub-task | Open | Major | Unresolved | Unassigned | Thomas Koch | Thomas Koch | 04/Nov/11 14:33 | 14/Jun/18 15:42 | 0 | 0 | At some points, sessionIds or zxid are written in decimal numbers to the log but most of the time as hexadecimals. It's an unnecessary hassle to manually convert these numbers to find additional log lines referring the same numbers. Or worse people may not know that there may be additional information available if they also search for the decimal representation of a number. | 216099 | No Perforce job exists for this issue. | 0 | 41999 | 8 years, 20 weeks, 5 days ago | 0|i07jy7: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1287 | ZOOKEEPER-1285 DataTree deserialization methods should return DataTree instance |
Sub-task | Open | Minor | Unresolved | Unassigned | Thomas Koch | Thomas Koch | 04/Nov/11 13:28 | 14/Jun/18 15:42 | 0 | 0 | There are a couple of deserialization methods that all receive a new DataTree instance as parameter forwarding this instance in a row until the last in the row populates this instance. While this pattern is derived from jute there's no reason not to instantiate a new DataTree object in the last deserialization method and returning it through the stack. That makes it easier to reason about the code because it then is obvious that the DataTree instance worked on is indeed a new instance. | 216093 | No Perforce job exists for this issue. | 0 | 42000 | 8 years, 20 weeks, 6 days ago | 0|i07jyf: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1286 | ZOOKEEPER-1198 QuorumPeer contains unused constructor |
Sub-task | Open | Trivial | Unresolved | Thomas Koch | Thomas Koch | Thomas Koch | 04/Nov/11 03:58 | 04/Nov/11 03:58 | 0 | 0 | The following constructor in QuorumPeer seems to be never used, starting at line 370 in my branch: {code:java} /** * For backward compatibility purposes, we instantiate QuorumMaj by default. */ public QuorumPeer(Map<Long, QuorumServer> quorumPeers, File dataDir, File dataLogDir, int electionType, long myid, int tickTime, int initLimit, int syncLimit, ServerCnxnFactory cnxnFactory) throws IOException { this(quorumPeers, dataDir, dataLogDir, electionType, myid, tickTime, initLimit, syncLimit, cnxnFactory, new QuorumMaj(countParticipants(quorumPeers))); } {code} |
216018 | No Perforce job exists for this issue. | 0 | 42001 | 8 years, 20 weeks, 6 days ago | 0|i07jyn: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1285 | make DataTree immutable |
Improvement | Open | Major | Unresolved | Thomas Koch | Thomas Koch | Thomas Koch | 03/Nov/11 10:38 | 01/May/13 22:29 | 0 | 1 | ZOOKEEPER-1287 | ZOOKEEPER-1228, ZOOKEEPER-1230, ZOOKEEPER-1259, ZOOKEEPER-1255, ZOOKEEPER-1276, ZOOKEEPER-1279, ZOOKEEPER-1253, ZOOKEEPER-1258, ZOOKEEPER-1092 | Having an immutable DataTree structure in the ZooKeeper server is an ambitious goal but is possible. Advantages would be: - No synchronization needed when accessing the DataTree. - The snapshotter thread gets an immutable datatree and will write a consistent DataTree to the disk. - No headaches whether multi transactions could lead to issues with (de)serialization. - Much better testability. - No concurrency - No headaches. - I hope for considerable speed improvements. Maybe also some memory savings, at least from refactorings possible after this step. - Statistical Data about the tree can be updated on every tree modification and is always consistent. The need to save statistical data in extra nodes for the quota feature goes away. Possible further improvements: Read requests actually don't need to enter the processor pipeline. Instead each server connection could get a reference to a (zxid, tree) tuple. Updates are delivered as (zxid, newTree, triggerWatchesCallback) to the server connections. The watches could be managed at each server connection instead of centrally at the DataTree. |
215914 | No Perforce job exists for this issue. | 0 | 42002 | 8 years, 20 weeks, 6 days ago | 0|i07jyv: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1284 | ZOOKEEPER-1198 Cleanup minor PrepRequestProcessor issues |
Sub-task | Patch Available | Minor | Unresolved | Thomas Koch | Thomas Koch | Thomas Koch | 03/Nov/11 05:35 | 03/Nov/11 06:13 | 0 | 1 | Instead of having if statements in every switch case in pRequest2Txn, it is possible to have only one if statement before the switch case in pRequest. | 215882 | No Perforce job exists for this issue. | 1 | 42003 | 8 years, 21 weeks ago | 0|i07jz3: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1283 | building 3.3 branch fails with Ant 1.8.2 (success with 1.7.1 though) |
Bug | Closed | Blocker | Fixed | Patrick D. Hunt | Patrick D. Hunt | Patrick D. Hunt | 03/Nov/11 01:35 | 29/Nov/11 12:54 | 15/Nov/11 13:10 | 3.3.3 | 3.3.4 | build | 0 | 1 | I tried to compile 3.3.3 or the current 3.3 branch head, in both cases using ant 1.8.2 fails, however 1.7.0 is successful here's the error: {noformat} Testsuite: org.apache.zookeeper.VerGenTest Tests run: 1, Failures: 1, Errors: 0, Time elapsed: 0.009 sec Testcase: warning took 0.001 sec FAILED Class org.apache.zookeeper.VerGenTest has no public constructor TestCase(String name) or TestCase() junit.framework.AssertionFailedError: Class org.apache.zookeeper.VerGenTest has no public constructor TestCase(String name) or TestCase() {noformat} |
215854 | No Perforce job exists for this issue. | 1 | 32640 | 8 years, 19 weeks, 6 days ago | committed revision 1202340 | 0|i05y7b: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1282 | ZOOKEEPER-1264 Learner.java not following Zab 1.0 protocol - setCurrentEpoch should be done upon receipt of NEWLEADER (before acking it) and not upon receipt of UPTODATE |
Sub-task | Closed | Major | Fixed | Benjamin Reed | Alexander Shraer | Alexander Shraer | 02/Nov/11 18:33 | 23/Nov/11 14:22 | 05/Nov/11 16:58 | 3.4.0 | 3.4.0, 3.5.0 | server | 0 | 0 | ZOOKEEPER-1264 | according to https://cwiki.apache.org/confluence/display/ZOOKEEPER/Zab1.0 phase 2 part 2, "Once it receives NEWLEADER(e) it atomically applies the new state and sets f.currentEpoch =e. " In Learner.java self.setCurrentEpoch(newEpoch) is done after receiving UPTODATE and not before acking the NEWLEADER message as should be. case Leader.UPTODATE: if (!snapshotTaken) { zk.takeSnapshot(); } self.cnxnFactory.setZooKeeperServer(zk); break outerLoop; case Leader.NEWLEADER: // it will be NEWLEADER in v1.0 zk.takeSnapshot(); snapshotTaken = true; writePacket(new QuorumPacket(Leader.ACK, newLeaderZxid, null, null), true); break; } } } long newEpoch = ZxidUtils.getEpochFromZxid(newLeaderZxid); self.setCurrentEpoch(newEpoch); |
215824 | No Perforce job exists for this issue. | 0 | 33299 | 8 years, 20 weeks, 4 days ago | Revision 1198053 | 0|i0629r: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1281 | Stat and srvr 4 letter commands are useless on the leader when leaderServes = false |
Improvement | Open | Major | Unresolved | Unassigned | Daniel Lord | Daniel Lord | 02/Nov/11 13:25 | 18/Sep/17 18:46 | 3.3.3 | server | 2 | 4 | When leaderServes = false the leader responds to the stat/srvr letter words with simply "this ZooKeeper instance is not currently serving requests". While I agree that is an accurate statement it's not terribly useful for monitoring programs. Additionally, if members of the ensemble are not currently in the quorum it becomes impossible to tell who is out of the quorum and who is the leader of the quorum. I'm not sure if the leader should have a specially formatted response for stat/srvr or if it should simply display less information (no connections for example). |
215765 | No Perforce job exists for this issue. | 0 | 42004 | 2 years, 26 weeks, 3 days ago | 0|i07jzb: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1280 | Add current epoch number and timestamp of when it began to 4 letter words (stat, srvr, mntr maybe?) |
Improvement | Open | Major | Unresolved | Unassigned | Daniel Lord | Daniel Lord | 02/Nov/11 13:21 | 02/Nov/11 13:21 | 3.3.3 | server | 0 | 1 | It would be nice if there were some stats displayed about the current epoch in the 4 letter words. At the very least it would be nice to expose the current epoch number (I know I could parse it from the Zxid but exposing it directly is more transparent) and the date of when the epoch began. | 215763 | No Perforce job exists for this issue. | 0 | 42005 | 8 years, 21 weeks, 1 day ago | 0|i07jzj: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1279 | ZOOKEEPER-1198 Only SessionTracker should hold reference to sessionsWithTimeouts |
Sub-task | Open | Minor | Unresolved | Thomas Koch | Thomas Koch | Thomas Koch | 02/Nov/11 13:13 | 01/May/13 22:29 | 0 | 1 | ZOOKEEPER-1285 | Currently the ZKDataBase, ZooKeeperServer and SessionTrackers hold references to the same map, called sessionsWithTimeouts everywhere. That's very confusing. It is possible to have the reference only in the SessionTrackers and take it from there if it should ever be needed outside. | 215761 | No Perforce job exists for this issue. | 0 | 42006 | 8 years, 21 weeks, 1 day ago | 0|i07jzr: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1278 | acceptedEpoch not handling zxid rollover in lower 32bits |
Bug | Resolved | Blocker | Duplicate | Patrick D. Hunt | Patrick D. Hunt | Patrick D. Hunt | 02/Nov/11 12:54 | 22/Mar/12 20:36 | 22/Mar/12 20:36 | 3.4.0, 3.5.0 | server | 0 | 0 | ZOOKEEPER-1277, ZOOKEEPER-335 | When the lower 32bits of a zxid "roll over" (zxid is a 64 bit number, however the upper 32 are considered the epoch number) the epoch number (upper 32 bits) are incremented and the lower 32 start at 0 again. This should work fine, however, afaict, in the current 3.4/3.5 the acceptedEpoch/currentEpoch files are not being updated for this case. See ZOOKEEPER-335 for changes from 3.3 branch. |
215756 | No Perforce job exists for this issue. | 1 | 32641 | 8 years, 1 week ago | Workaround: there is a simple workaround for this issue. Force a leader re-election before the lower 32bits reach 0xffffffff Most users won't even see this given the number of writes on a typical installation - say you are doing 500 writes/second, you'd see this after ~3 months if the quorum is stable, any changes (such as upgrading the server software) would cause the xid to be reset, thereby staving off this issue for another period. |
0|i05y7j: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1277 | servers stop serving when lower 32bits of zxid roll over |
Bug | Resolved | Critical | Fixed | Patrick D. Hunt | Patrick D. Hunt | Patrick D. Hunt | 02/Nov/11 12:46 | 28/Feb/19 15:20 | 15/Mar/12 12:55 | 3.3.3 | 3.3.5, 3.4.4, 3.5.0 | server | 0 | 11 | ZOOKEEPER-2789, ZOOKEEPER-1278, ZOOKEEPER-3253 | When the lower 32bits of a zxid "roll over" (zxid is a 64 bit number, however the upper 32 are considered the epoch number) the epoch number (upper 32 bits) are incremented and the lower 32 start at 0 again. This should work fine, however in the current 3.3 branch the followers see this as a NEWLEADER message, which it's not, and effectively stop serving clients. Attached clients seem to eventually time out given that heartbeats (or any operation) are no longer processed. The follower doesn't recover from this. I've tested this out on 3.3 branch and confirmed this problem, however I haven't tried it on 3.4/3.5. It may not happen on the newer branches due to ZOOKEEPER-335, however there is certainly an issue with updating the "acceptedEpoch" files contained in the datadir. (I'll enter a separate jira for that) |
215755 | No Perforce job exists for this issue. | 8 | 12511 | 2 years, 43 weeks, 1 day ago | Workaround: there is a simple workaround for this issue. Force a leader re-election before the lower 32bits reach 0xffffffff Most users won't even see this given the number of writes on a typical installation - say you are doing 500 writes/second, you'd see this after ~3 months if the quorum is stable, any changes (such as upgrading the server software) would cause the xid to be reset, thereby staving off this issue for another period. |
Reviewed
|
0|i02hyn: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1276 | ZOOKEEPER-1198 ZKDatabase should not hold reference to FileTxnSnapLog |
Sub-task | Open | Minor | Unresolved | Thomas Koch | Thomas Koch | Thomas Koch | 02/Nov/11 12:38 | 01/May/13 22:29 | 0 | 0 | ZOOKEEPER-1285 | The ZkDatabase class contains a reference to a FileTxnSnapLog although it doesn't need it. It has four methods that just forward calls to the instance and two methods that could receive an instance of FileTxnSnapLog instead of refering to a member of _this_. | 215751 | No Perforce job exists for this issue. | 0 | 42007 | 8 years, 21 weeks, 1 day ago | 0|i07jzz: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1275 | ZOOKEEPER-233 ZooKeeper client is only caller of server.DataTree.copyStat() |
Sub-task | Resolved | Minor | Won't Fix | Thomas Koch | Thomas Koch | Thomas Koch | 01/Nov/11 06:01 | 19/Mar/12 02:18 | 19/Mar/12 02:18 | build, java client | 0 | 1 | This static method should be moved out of the o.a.z.server package. To my knowledge it is the only coupling of ZK client code to server code and the server doesn't even call this method. | 215497 | No Perforce job exists for this issue. | 1 | 33300 | 8 years, 9 weeks ago | 0|i0629z: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1274 | Support child watches to be displayed with 4 letter zookeeper commands (i.e. wchs, wchp and wchc) |
Bug | Open | Major | Unresolved | Chris Nauroth | amith | amith | 31/Oct/11 03:06 | 05/Feb/20 07:16 | 3.7.0, 3.5.8 | server | 4 | 7 | ZOOKEEPER-2062 | Zookeeper Server | currently only data watchers (created by exists() and getdata() )are getting displayed with wchs,wchp,wchc 4 letter command command It would be useful to get the infomation related to childwatchers ( getChildren() ) also with 4 letter words. |
215312 | No Perforce job exists for this issue. | 2 | 32642 | 4 years, 46 weeks, 6 days ago | 0|i05y7r: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1273 | Copy'n'pasted unit test |
Bug | Resolved | Trivial | Fixed | Thomas Koch | Thomas Koch | Thomas Koch | 30/Oct/11 14:33 | 01/Nov/11 06:57 | 31/Oct/11 16:06 | 3.5.0 | tests | 0 | 2 | Probably caused by the usage of a legacy VCS a code duplication happened when you moved from Sourceforge to Apache (ZOOKEEPER-38). The following file can be deleted: src/java/test/org/apache/zookeeper/server/DataTreeUnitTest.java src/java/test/org/apache/zookeeper/test/DataTreeTest.java was an exact copy of the above until ZOOKEEPER-1046 added an additional test case only to the latter. Do I need to upload a patch file for this? |
215279 | No Perforce job exists for this issue. | 1 | 32643 | 8 years, 21 weeks, 2 days ago |
Reviewed
|
0|i05y7z: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1272 | ZooKeeper.multi() could violate API if server misbehaves |
Bug | Open | Minor | Unresolved | Thomas Koch | Thomas Koch | Thomas Koch | 29/Oct/11 09:36 | 30/Oct/11 13:11 | 0 | 0 | The client API method Zookeeper.multi() promisses, that the KeeperException it throws in case of one of the multi ops failing, contains a list of individual results. The method ZooKeeper.multiInternal() however throws a Keeperexception if the returned response header has an error code != 0. This should actually never happen if the server does not misbehave since the error code of a multi response is always zero, but I managed to trigger this code path with my refactorings. |
215218 | No Perforce job exists for this issue. | 0 | 32644 | 8 years, 21 weeks, 4 days ago | 0|i05y87: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1271 | testEarlyLeaderAbandonment failing on solaris - clients not retrying connection |
Bug | Closed | Blocker | Fixed | Mahadev Konar | Patrick D. Hunt | Patrick D. Hunt | 28/Oct/11 18:33 | 23/Nov/11 14:21 | 02/Nov/11 17:59 | 3.3.4, 3.4.0, 3.5.0 | 3.3.4, 3.4.0, 3.5.0 | java client | 0 | 2 | ZOOKEEPER-1174 | See: https://builds.apache.org/view/S-Z/view/ZooKeeper/job/ZooKeeper_branch34_solaris/1/testReport/junit/org.apache.zookeeper.server.quorum/QuorumPeerMainTest/testEarlyLeaderAbandonment/ Notice that the clients attempt to connect before the servers have bound, then 30 seconds later, after seemingly no further client activity we see: 2011-10-28 21:40:56,828 [myid:] - INFO [main-SendThread(localhost:11227):ClientCnxn$SendThread@1057] - Client session timed out, have not heard from server in 30032ms for sessionid 0x0, closing socket connection and attempting reconnect I believe this is different from ZOOKEEPER-1270 because in the 1270 case it seems like the clients are attempting to connect but the servers are not accepting (notice the stat commands are being dropped due to no server running) |
215187 | No Perforce job exists for this issue. | 6 | 32645 | 8 years, 20 weeks, 1 day ago |
Reviewed
|
0|i05y8f: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1270 | testEarlyLeaderAbandonment failing intermittently, quorum formed, no serving. |
Bug | Closed | Blocker | Fixed | Flavio Paiva Junqueira | Patrick D. Hunt | Patrick D. Hunt | 28/Oct/11 18:25 | 23/Nov/11 14:22 | 05/Nov/11 07:46 | 3.4.0, 3.5.0 | server | 0 | 4 | ZOOKEEPER-1194 | Looks pretty serious - quorum is formed but no clients can attach. Will attach logs momentarily. This test was introduced in the following commit (all three jira commit at once): ZOOKEEPER-335. zookeeper servers should commit the new leader txn to their logs. ZOOKEEPER-1081. modify leader/follower code to correctly deal with new leader ZOOKEEPER-1082. modify leader election to correctly take into account current |
215186 | No Perforce job exists for this issue. | 15 | 32646 | 8 years, 20 weeks, 3 days ago |
Reviewed
|
0|i05y8n: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1269 | Multi deserialization issues |
Bug | Closed | Major | Fixed | Camille Fournier | Camille Fournier | Camille Fournier | 28/Oct/11 17:34 | 16/Dec/11 20:33 | 09/Dec/11 17:25 | 3.4.0 | 3.4.1, 3.5.0 | server | 0 | 2 | From the mailing list: FileTxnSnapLog.restore contains a code block handling a NODEEXISTS failure during deserialization. The problem is explained there in a code comment. The code block however is only executed for a CREATE txn, not for a multiTxn containing a CREATE. Even if the mentioned code block would also be executed for multi transactions, it needs adaption for multi transactions. What, if after the first failed transaction in a multi txn during deserialization, there would be subsequent transactions in the same multi that would also have failed? We don't know, since the first failed transaction hides the information about the remaining transactions. |
215177 | No Perforce job exists for this issue. | 1 | 32647 | 8 years, 15 weeks, 5 days ago |
Reviewed
|
0|i05y8v: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1268 | problems with read only mode, intermittent test failures and ERRORs in the log |
Bug | Closed | Blocker | Fixed | Patrick D. Hunt | Patrick D. Hunt | Patrick D. Hunt | 28/Oct/11 14:01 | 23/Nov/11 14:22 | 01/Nov/11 03:15 | 3.4.0, 3.5.0 | 3.4.0, 3.5.0 | server | 0 | 1 | I'm having a lot problems testing the 3.4.0 release candidate (0). I'm seeing frequent failures in RO unit tests, also the solaris tests are broken on jenkins, some of which is due to RO mode: https://builds.apache.org/view/S-Z/view/ZooKeeper/job/ZooKeeper_trunk_solaris/30/#showFailuresLink I'm also seeing ERROR level messages in the logs during test runs that are a result of attempting to start RO mode. Given this is a new feature, one that could be very disruptive, I think we need to control whether the feature is enabled or not through a config option (system prop is fine), disabled by default. I'll look at the RO mode tests to see if I can find the cause of the failures on solaris, but I may also turn off these tests for the time being. (I need to look at this further). I'm marking this as a blocker for 3.4.0, Mahadev LMK if you feel similarly or whether I should be shooting for 3.4.1 with this. (or perhaps I'm just way off in general). |
215148 | No Perforce job exists for this issue. | 2 | 32648 | 8 years, 21 weeks, 2 days ago |
Reviewed
|
0|i05y93: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1267 | ZOOKEEPER-1198 closeSession flag in finalRequestProcessor is superfluous |
Sub-task | Resolved | Trivial | Fixed | Thomas Koch | Thomas Koch | Thomas Koch | 28/Oct/11 10:16 | 29/Oct/11 06:56 | 28/Oct/11 13:00 | 0 | 0 | The variable can be removed and instead where it is evaluated one can just check whether the request.type was OpCode.closesession. Removes one indirection from your head in a method that's long enough already. | 215118 | No Perforce job exists for this issue. | 1 | 33301 | 8 years, 21 weeks, 5 days ago |
Reviewed
|
0|i062a7: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1266 | ZOOKEEPER-1198 "request.getHdr() != null" and "isQuorum" are identical |
Sub-task | Open | Minor | Unresolved | Thomas Koch | Thomas Koch | Thomas Koch | 28/Oct/11 09:58 | 28/Oct/11 12:52 | 0 | 1 | FinalRequestProcessor has this code block: {code:java} if (request.getHdr() != null) { ... SNIP ... } // do not add non quorum packets to the queue. if (request.isQuorum()) { zks.getZKDatabase().addCommittedProposal(request); } {code} Both conditions are equivalent so the two if blocks could actually be merged to one block. |
215113 | No Perforce job exists for this issue. | 1 | 42008 | 8 years, 21 weeks, 6 days ago | 0|i07k07: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1265 | Normalize switch cases lists on request types |
Bug | Resolved | Major | Fixed | Thomas Koch | Thomas Koch | Thomas Koch | 28/Oct/11 09:35 | 29/Oct/11 06:56 | 28/Oct/11 12:38 | 0 | 2 | As discussed on the list, it's probably an error that the ReadOnlyRequestProcessor does not have multi alongside the other write operations. Adding check to the lists may not make a difference by now since the ZK client does not expose check as a first level request but only encapsulated inside a multi request. However from a logical view, change belongs in these lists. |
215109 | No Perforce job exists for this issue. | 1 | 32649 | 8 years, 21 weeks, 5 days ago |
Reviewed
|
0|i05y9b: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1264 | FollowerResyncConcurrencyTest failing intermittently |
Bug | Closed | Blocker | Fixed | Camille Fournier | Patrick D. Hunt | Patrick D. Hunt | 28/Oct/11 00:23 | 23/Nov/11 14:22 | 05/Nov/11 16:58 | 3.3.3, 3.4.0, 3.5.0 | 3.3.4, 3.4.0, 3.5.0 | tests | 0 | 5 | ZOOKEEPER-1282, ZOOKEEPER-1291 | ZOOKEEPER-1282 | The FollowerResyncConcurrencyTest test is failing intermittently. saw the following on 3.4: {noformat} junit.framework.AssertionFailedError: Should have same number of ephemerals in both followers expected:<11741> but was:<14001> at org.apache.zookeeper.test.FollowerResyncConcurrencyTest.verifyState(FollowerResyncConcurrencyTest.java:400) at org.apache.zookeeper.test.FollowerResyncConcurrencyTest.testResyncBySnapThenDiffAfterFollowerCrashes(FollowerResyncConcurrencyTest.java:196) at org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52) {noformat} |
215052 | No Perforce job exists for this issue. | 19 | 32650 | 8 years, 20 weeks, 4 days ago | Revision 1198053 | 0|i05y9j: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1263 | fix handling of min/max session timeout value initialization |
Task | Resolved | Major | Fixed | Rakesh Radhakrishnan | Patrick D. Hunt | Patrick D. Hunt | 27/Oct/11 17:47 | 20/Jul/15 06:54 | 25/Mar/14 17:14 | 3.5.0 | server | 0 | 5 | ZOOKEEPER-1213, ZOOKEEPER-1227 | SLIDER-862 | This task rolls up the changes in subtasks for easier commit. (I'm about to submit the rolled up patch) | 215009 | No Perforce job exists for this issue. | 5 | 42009 | 6 years, 1 day ago | trunk: http://svn.apache.org/viewvc?view=revision&revision=1581522 |
Incompatible change
|
0|i07k0f: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1262 | Documentation for Lock recipe has major flaw |
Bug | Resolved | Major | Fixed | Jordan Zimmerman | Jordan Zimmerman | Jordan Zimmerman | 27/Oct/11 17:46 | 28/Dec/11 16:18 | 28/Dec/11 16:18 | 3.3.3 | 3.5.0 | documentation | 0 | 2 | The recipe for Locks documented here: http://zookeeper.apache.org/doc/trunk/recipes.html#sc_recipes_Locks doesn't deal with the problem of create() succeeding but the server crashing before the result is returned. As written, if the server crashes before the result is returned the client can never know what sequential node was created for it. The way to deal with this is to embed the session ID in the node name. The Lock implementation in the ZK distro does this. But, the documentation will lead implementors to write bad code. | 215008 | No Perforce job exists for this issue. | 3 | 32651 | 8 years, 13 weeks, 1 day ago | Updated recipes to document how to use a GUID to manage a recoverable create() error. |
Reviewed
|
0|i05y9r: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1261 | Make ZooKeeper code mode Dependency Injection compliant. |
Improvement | Open | Major | Unresolved | Unassigned | Mahadev Konar | Mahadev Konar | 27/Oct/11 14:00 | 05/Feb/20 07:16 | 3.7.0, 3.5.8 | 0 | 0 | Our code base is a little tricky to unit test and also needs fixing to be able to maintainable long term. We should make our components DI compliant, so that they are easier to test and maintainable in the long term. This is just an umbrella jira, I am sure we will need a huge code churn to be able to achieve this goal. | 214957 | No Perforce job exists for this issue. | 0 | 42010 | 8 years, 22 weeks ago | 0|i07k0n: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1260 | Audit logging in ZooKeeper servers. |
New Feature | Resolved | Major | Fixed | Mohammad Arshad | Mahadev Konar | Mahadev Konar | 27/Oct/11 13:49 | 19/Nov/19 05:18 | 11/Nov/19 07:59 | 3.6.0 | server | 6 | 16 | 0 | 42000 | ZOOKEEPER-2287, RANGER-924 | Lots of users have had questions on debugging which client changed what znode and what updates went through a znode. We should add audit logging as in Hadoop (look at Namenode Audit logging) to log which client changed what in the zookeeper servers. This could just be a log4j audit logger. | 100% | 100% | 42000 | 0 | pull-request-available | 214956 | No Perforce job exists for this issue. | 2 | 42011 | 18 weeks, 3 days ago | 0|i07k0v: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1259 | ZOOKEEPER-1198 central mapping from type to txn record class |
Sub-task | Open | Major | Unresolved | Thomas Koch | Thomas Koch | Thomas Koch | 27/Oct/11 07:56 | 05/Feb/20 07:17 | 3.7.0, 3.5.8 | 0 | 2 | ZOOKEEPER-1285 | There are two places where large switch statements do nothing else to get the correct Record class accorging to a txn type. Provided a static map in SerializeUtils from type to Class<? extends Record> and a method to retrieve a new txn Record instance for a type. Code size reduced by 28 lines. |
214897 | No Perforce job exists for this issue. | 4 | 42012 | 5 years, 51 weeks, 3 days ago | 0|i07k13: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1258 | ZOOKEEPER-1198 Move MultiResponse creation out of FinalRequestProcessor |
Sub-task | Patch Available | Major | Unresolved | Thomas Koch | Thomas Koch | Thomas Koch | 27/Oct/11 07:08 | 07/Oct/13 21:22 | 0 | 1 | ZOOKEEPER-1285 | There is a longish code block in the switch case of the FinalRequestProcessor iterating over rc.multiResult and building a MultiResponse. Moved the code where it belongs, to MultiResponse and OpResult. | 214894 | No Perforce job exists for this issue. | 1 | 42013 | 6 years, 24 weeks, 2 days ago | 0|i07k1b: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1257 | ZOOKEEPER-1198 Rename MultiTransactionRecord to MultiRequest |
Sub-task | Open | Critical | Unresolved | Thomas Koch | Thomas Koch | Thomas Koch | 27/Oct/11 05:51 | 18/Mar/16 16:03 | 0 | 3 | Understanding the code behind multi operations doesn't get any easier when the code violates naming consistency. All other Request classes are called xxxRequest, only for multi its xxxTransactionRecord! Also "Transaction" is wrong, because there is the concepts of transactions that are transmitted between quorum peers or committed to disc. MultiTransactionRecord however is a _Request_ from a client. |
214886 | No Perforce job exists for this issue. | 0 | 42014 | 4 years, 6 days ago | 0|i07k1j: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1256 | ClientPortBindTest is failing on Mac OS X |
Bug | Closed | Major | Fixed | Flavio Paiva Junqueira | Daniel Gómez Ferro | Daniel Gómez Ferro | 27/Oct/11 03:45 | 17/May/17 23:43 | 29/Jul/16 19:05 | 3.5.3, 3.6.0 | tests | 0 | 7 | ZOOKEEPER-1954, ZOOKEEPER-2482 | Mac OS X | ClientPortBindTest is failing consistently on Mac OS X. | 214880 | No Perforce job exists for this issue. | 6 | 12508 | 3 years, 33 weeks, 6 days ago |
Reviewed
|
0|i02hxz: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1255 | ZOOKEEPER-1198 unused fields in DataTree.ProcessTxnResult |
Sub-task | Open | Minor | Unresolved | Thomas Koch | Thomas Koch | Thomas Koch | 27/Oct/11 03:38 | 01/May/13 22:29 | 0 | 2 | ZOOKEEPER-1285 | The fields zxid, cxid and clientId in ProcessTxnResult are never used. cxid and clientId are used in equals() and hashCode() but the class is never ever used as a key or compared. Keeping equals() and hashCode() "just in case" is a bad idea: http://www.infoq.com/news/2011/05/less-code-is-better |
214879 | No Perforce job exists for this issue. | 1 | 42015 | 8 years, 21 weeks, 3 days ago | 0|i07k1r: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1254 | test correct watch handling with multi ops |
Improvement | Resolved | Major | Fixed | Thomas Koch | Thomas Koch | Thomas Koch | 26/Oct/11 10:25 | 27/Oct/11 06:54 | 26/Oct/11 12:58 | 0 | 1 | I was wondering, what happens with watches that would be triggered by a multi op if subsequent ops fail. I didn't find a test for this, wrote one and everything was fine. :-) The patch contains two additional test cases. |
214751 | No Perforce job exists for this issue. | 1 | 33302 | 8 years, 22 weeks ago |
Reviewed
|
0|i062af: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1253 | ZOOKEEPER-1198 return value of DataTree.createNode is never used |
Sub-task | Resolved | Trivial | Fixed | Thomas Koch | Thomas Koch | Thomas Koch | 26/Oct/11 09:30 | 01/May/13 22:29 | 14/Dec/11 18:40 | 3.5.0 | 0 | 1 | ZOOKEEPER-1285 | createNode returns the unmodified path string which it has received as parameter. Consequently no caller uses the return value. | 214745 | No Perforce job exists for this issue. | 2 | 33303 | 8 years, 15 weeks ago |
Reviewed
|
0|i062an: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1252 | ZOOKEEPER-1198 remove unused method o.a.z.test.AxyncTest.restart() |
Sub-task | Resolved | Trivial | Fixed | Thomas Koch | Thomas Koch | Thomas Koch | 26/Oct/11 08:55 | 28/Oct/11 06:55 | 27/Oct/11 12:24 | 3.5.0 | 0 | 0 | see Summary. | 214735 | No Perforce job exists for this issue. | 2 | 33304 | 8 years, 21 weeks, 6 days ago |
Reviewed
|
0|i062av: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1251 | ZOOKEEPER-1198 call checkSession at begin of PrepRequestProcessor.pRequest |
Sub-task | Patch Available | Major | Unresolved | Thomas Koch | Thomas Koch | Thomas Koch | 26/Oct/11 07:07 | 12/Nov/11 07:15 | 0 | 0 | There are 6 locations that call checkSession. This can be reduced to one location and makes it also much clearer in which cases checkSession is called or not called. Note that in case that now the SessionMoved|Expired error is checked first before the check for a Marshalling error. However it shouldn't matter which error gets reported. |
214713 | No Perforce job exists for this issue. | 3 | 42016 | 8 years, 19 weeks, 5 days ago |
Reviewed
|
0|i07k1z: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1250 | ZOOKEEPER-1198 trigger jenkins dummy issue |
Sub-task | Resolved | Trivial | Invalid | Thomas Koch | Thomas Koch | Thomas Koch | 25/Oct/11 13:09 | 02/Nov/11 11:58 | 02/Nov/11 11:58 | 0 | 1 | Sorry, I don't have my own servers for testing, so I need to upload patches here to run the ZK test suite. | 214578 | No Perforce job exists for this issue. | 9 | 33305 | 8 years, 21 weeks, 1 day ago | 0|i062b3: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1249 | jline should be an optional maven dependency |
Improvement | Resolved | Trivial | Duplicate | Unassigned | David Smiley | David Smiley | 25/Oct/11 11:40 | 01/Sep/14 03:10 | 11/Oct/13 12:41 | build | 0 | 2 | ZOOKEEPER-1655 | When a project adds a maven dependency to zookeeper, they probably don't want the jline dependency. jline should have <optional>true</optional> in zookeeper's maven pom. | 214551 | No Perforce job exists for this issue. | 0 | 42017 | 6 years, 23 weeks, 6 days ago | 0|i07k27: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1248 | ZOOKEEPER-1198 multi transaction sets request.exception without reason |
Sub-task | Open | Major | Unresolved | Thomas Koch | Thomas Koch | Thomas Koch | 25/Oct/11 09:18 | 05/Feb/20 07:16 | 3.7.0, 3.5.8 | 0 | 1 | I'm trying to understand the purpose of the exception field in request. This isn't made easier by the fact that the multi case in PrepRequestProcessor sets the exception without reason. The only code that calls request.getException() is in FinalRequestProcessor and this code only acts when the operation _is not_ a multi operation. |
214531 | No Perforce job exists for this issue. | 3 | 42018 | 8 years, 15 weeks, 1 day ago | 0|i07k2f: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1247 | ZOOKEEPER-1198 dead code in PrepRequestProcessor.pRequest multi case |
Sub-task | Resolved | Major | Fixed | Thomas Koch | Thomas Koch | Thomas Koch | 25/Oct/11 07:24 | 28/Oct/11 06:55 | 27/Oct/11 18:58 | 3.5.0 | 0 | 0 | There's an if statement in the for loop which sets the request.hdr.type and request.txn in case that an error happened in the preceding multiop. However hdr and txn are overwritten anyways at the end of the multi case. The values set are only used a bit later to serialize them. This could better be achieved with local variables holding the temporary hdr and txn. Also the if condition (ke == null) in the catch block is pointless, since the surrounding if(ke != null) makes sure that the catch block could only ever be reached in a loop where ke == null. |
214513 | No Perforce job exists for this issue. | 1 | 33306 | 8 years, 21 weeks, 6 days ago |
Reviewed
|
0|i062bb: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1246 | ZOOKEEPER-1198 Dead code in PrepRequestProcessor catch Exception block |
Sub-task | Closed | Blocker | Fixed | Camille Fournier | Thomas Koch | Thomas Koch | 25/Oct/11 05:57 | 23/Nov/11 14:22 | 02/Nov/11 14:31 | 3.4.0, 3.5.0 | 0 | 2 | This is a regression introduced by ZOOKEEPER-965 (multi transactions). The catch(Exception e) block in PrepRequestProcessor.pRequest contains an if block with condition request.getHdr() != null. This condition will always evaluate to false since the changes in ZOOKEEPER-965. This is caused by a change in sequence: Before ZK-965, the txnHeader was set _before_ the deserialization of the request. Afterwards the deserialization happens before request.setHdr is set. So the following RequestProcessors won't see the request as a failed one but as a Read request, since it doesn't have a hdr set. Notes: - it is very bad practice to catch Exception. The block should rather catch IOException - The check whether the TxnHeader is set in the request is used at several places to see whether the request is a read or write request. It isn't obvious for a newby, what it means whether a request has a hdr set or not. - at the beginning of pRequest the hdr and txn of request are set to null. However there is no chance that these fields could ever not be null at this point. The code however suggests that this could be the case. There should rather be an assertion that confirms that these fields are indeed null. The practice of doing things "just in case", even if there is no chance that this case could happen, is a very stinky code smell and means that the code isn't understandable or trustworthy. - The multi transaction switch case block in pRequest is very hard to read, because it missuses the request.{hdr|txn} fields as local variables. |
214508 | No Perforce job exists for this issue. | 6 | 33307 | 8 years, 21 weeks, 1 day ago | 0|i062bj: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1245 | ZOOKEEPER-1198 fix compiler warnings in contrib loggraph |
Sub-task | Open | Major | Unresolved | Unassigned | Thomas Koch | Thomas Koch | 25/Oct/11 03:00 | 25/Oct/11 03:01 | 0 | 2 | Eclipse shows around 300 compiler warnings in loggraph, many of them no-brainers like missing generics. | 214493 | No Perforce job exists for this issue. | 0 | 42019 | 8 years, 22 weeks, 2 days ago | 0|i07k2n: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1244 | ZOOKEEPER-1198 resolve remaining compiler warnings |
Sub-task | Open | Major | Unresolved | Unassigned | Thomas Koch | Thomas Koch | 25/Oct/11 02:59 | 25/Oct/11 02:59 | 0 | 1 | The ZooKeeper main codebase, including tests, currently triggers only 5 warnings in eclipse. The remaining 5 warnings should be fixed by people knowing these classes better then me. Once the warnings are down to zero it could be made a policy to keep it that way. The contrib loggraph however has around 300 warnings, many of them missing generics. |
214492 | No Perforce job exists for this issue. | 0 | 42020 | 8 years, 22 weeks, 2 days ago | 0|i07k2v: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1243 | New 4lw for short simple monitoring ldck |
Improvement | Resolved | Blocker | Won't Fix | Camille Fournier | Camille Fournier | Camille Fournier | 24/Oct/11 10:43 | 17/Nov/11 01:05 | 24/Oct/11 18:09 | 3.3.3, 3.4.0 | server | 0 | 0 | The existing monitoring fails so often due to https://issues.apache.org/jira/browse/ZOOKEEPER-1197 that we need a workaround. This introduces a short 4lw called ldck that just runs ServerStats.toString to get information about the sever's leadership status. | 214355 | No Perforce job exists for this issue. | 3 | 33308 | 8 years, 22 weeks, 3 days ago | Srvr command duplicates. | 0|i062br: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1242 | Repeat add watcher, memory leak |
Bug | Open | Major | Unresolved | Peng Futian | Peng Futian | Peng Futian | 23/Oct/11 21:35 | 14/Dec/19 06:07 | 3.3.3 | 3.7.0 | c client | 1 | 1 | 3600 | 3600 | 0% | Redhat linux | When I repeat add watcher , there are a memory leak. |
0% | 0% | 3600 | 3600 | patch | 214293 | No Perforce job exists for this issue. | 1 | 32652 | 8 years, 16 weeks, 1 day ago | 0|i05y9z: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1241 | Typo in ZooKeeper Recipes and Solutions documentation |
Bug | Resolved | Minor | Fixed | Jingguo Yao | Jingguo Yao | Jingguo Yao | 23/Oct/11 11:10 | 24/Oct/11 06:53 | 24/Oct/11 04:01 | 3.3.3 | 3.5.0 | documentation | 0 | 1 | 300 | 300 | 0% | In "if p is the lowest process node in L, wait on highest process node in P", "P" should be "L". | 0% | 0% | 300 | 300 | 214278 | No Perforce job exists for this issue. | 1 | 32653 | 8 years, 22 weeks, 3 days ago |
Reviewed
|
0|i05ya7: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1240 | Compiler issue with redhat linux |
Bug | Open | Minor | Unresolved | Peng Futian | Peng Futian | Peng Futian | 21/Oct/11 22:16 | 14/Dec/19 06:08 | 3.3.3 | 3.7.0 | c client | 1 | 3 | 3600 | 3600 | 0% | Linux phy 2.6.18-53.el5 #1 SMP Wed Oct 10 16:34:19 EDT 2007 x86_64 x86_64 x86_64 GNU/Linux gcc version 4.1.2 20070626 (Red Hat 4.1.2-14) |
When I compile zookeeper c client in my project, there are some error: ../../../include/zookeeper/recordio.h:70: error:expected unqualified-id before '__extension__' ../../../include/zookeeper/recordio.h:70: error:expected `)' before '__extension__' ../../.. /include/zookeeper/recordio.h:70: error:expected unqualified-id before ')' token |
0% | 0% | 3600 | 3600 | patch | 113481 | No Perforce job exists for this issue. | 1 | 32654 | 6 years, 30 weeks ago | Fix compile error under RedHat linux | c client | 0|i05yaf: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1239 | add logging/stats to identify fsync stalls |
Improvement | Closed | Major | Fixed | Patrick D. Hunt | Patrick D. Hunt | Patrick D. Hunt | 21/Oct/11 19:48 | 23/Nov/11 14:22 | 15/Nov/11 13:31 | 3.3.4, 3.4.0, 3.5.0 | server | 0 | 0 | We don't have any logging to identify fsync stalls. It's a somewhat common occurrence (after gc/swap issues) when trying to diagnose pipeline stalls - where outstanding requests start piling up and operational latency increases. We should have some sort of logging around this. e.g. if the fsync time exceeds some limit then log a warning, something like that. It would also be useful to publish "stat" information related to this. min/avg/max latency for fsync. This should also be exposed through JMX. |
113472 | No Perforce job exists for this issue. | 2 | 33309 | 8 years, 19 weeks, 1 day ago | committed to 3.3.4, 3.4, trunk rev 1202360 | 0|i062bz: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1238 | when the linger time was changed for NIO the patch missed Netty |
Bug | Closed | Major | Fixed | Skye Wanderman-Milne | Patrick D. Hunt | Patrick D. Hunt | 20/Oct/11 12:58 | 13/Mar/14 14:17 | 12/Jan/14 16:36 | 3.4.0, 3.5.0 | 3.4.6, 3.5.0 | server | 0 | 5 | ZOOKEEPER-1049 | from NettyServerCnxn: bq. bootstrap.setOption("child.soLinger", 2); See ZOOKEEPER-1049 |
92391 | No Perforce job exists for this issue. | 1 | 12497 | 6 years, 2 weeks ago |
Reviewed
|
0|i02hvj: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1237 | ERRORs being logged when queued responses are sent after socket has closed. |
Bug | Resolved | Major | Duplicate | Unassigned | Patrick D. Hunt | Patrick D. Hunt | 20/Oct/11 12:54 | 30/May/18 20:16 | 24/Jan/17 18:58 | 3.3.4, 3.4.0, 3.5.0 | 3.4.10 | server | 16 | 39 | ZOOKEEPER-2044 | After applying ZOOKEEPER-1049 to 3.3.3 (I believe the same problem exists in 3.4/3.5 but haven't tested this) I'm seeing the following exception more frequently: {noformat} Oct 19, 1:31:53 PM ERROR Unexpected Exception: java.nio.channels.CancelledKeyException at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:55) at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:59) at org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:418) at org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1509) at org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:367) at org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:73) {noformat} This is a long standing problem where we try to send a response after the socket has been closed. Prior to ZOOKEEPER-1049 this issues happened much less frequently (2 sec linger), but I believe it was possible. The timing window is just wider now. |
92387 | No Perforce job exists for this issue. | 1 | 32655 | 3 years, 13 weeks, 1 day ago | 0|i05yan: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1236 | Security uses proprietary Sun APIs |
Bug | Resolved | Major | Fixed | Adalberto Medeiros | Patrick D. Hunt | Patrick D. Hunt | 20/Oct/11 12:05 | 04/Jul/12 14:25 | 30/Jun/12 02:31 | 3.4.0, 3.4.3 | 3.4.4, 3.5.0 | server | 0 | 5 | HADOOP-6941, ZOOKEEPER-1474, ZOOKEEPER-938, HADOOP-7211 | See HADOOP-7211 - Recent kerberos integration resulted in the same issue in ZK. {noformat} [javac] /home/phunt/dev/zookeeper/src/java/main/org/apache/zookeeper/server/auth/KerberosName.java:88: warning: sun.security.krb5.KrbException is Sun proprietary API and may be removed in a future release [javac] } catch (KrbException ke) { {noformat} |
92372 | No Perforce job exists for this issue. | 2 | 32656 | 7 years, 38 weeks, 5 days ago |
Reviewed
|
0|i05yav: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1235 | ZOOKEEPER-1198 store KeeperException messages in the Code enum |
Sub-task | Patch Available | Major | Unresolved | Thomas Koch | Thomas Koch | Thomas Koch | 19/Oct/11 06:08 | 05/Feb/20 07:11 | 3.7.0, 3.5.8 | 0 | 2 | Enums are just objects that can have properties. So instead of switching on the code integer, the message can be stored in the enum. OK( OK) becomes OK (Ok, "ok") getCodeMessage(Code code) just returns code.getMessage() |
89201 | No Perforce job exists for this issue. | 2 | 42021 | 3 years, 39 weeks, 2 days ago | 0|i07k33: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1234 | ZOOKEEPER-1198 basic cleanup in LearnerHandler |
Sub-task | Open | Major | Unresolved | Thomas Koch | Thomas Koch | Thomas Koch | 18/Oct/11 04:06 | 27/Oct/11 18:47 | 0 | 1 | - order class members: properties, constructor, methods - make properties private final - rename version to protocolVersion - The integer value 0x10000 should be extracted to a constant with a declarative name. But since I don't yet fully understand its purpose, I've no idea for the name of the constant. - Initialize properties BinaryInpuutArchive ia, BinaryOutputArchive oa and BuferedOutputSream bufferedOutput in the constructor so that they can be made final. - Remove call to sock.setSoTimeout. All two users of the class set the sockettimeout anyways themselfes. This also removes a link to the Leader class. - remove unused method packetToString. |
88767 | No Perforce job exists for this issue. | 2 | 42022 | 8 years, 22 weeks ago | 0|i07k3b: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1233 | ZOOKEEPER-1198 throw RuntimeExceptions for Exceptions that "should never happen" |
Sub-task | Open | Major | Unresolved | Unassigned | Thomas Koch | Thomas Koch | 17/Oct/11 07:07 | 17/Oct/11 07:07 | 0 | 0 | see effective java Ed2. item 65 ("Don't ignore exceptions"). If you're really sure, that the exception will never appear, then you shouldn't fear to rethrow it. | 88133 | No Perforce job exists for this issue. | 0 | 42023 | 8 years, 23 weeks, 3 days ago | 0|i07k3j: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1232 | remove unused o.a.z.server.util.Profiler |
Improvement | Resolved | Minor | Fixed | Thomas Koch | Thomas Koch | Thomas Koch | 17/Oct/11 05:34 | 15/Dec/11 06:58 | 14/Dec/11 18:26 | 3.5.0 | 0 | 1 | The class is not used and it rather harms to suggest to people that this would be the right way to do micro-benchmarks on the JVM. It even harms to suggest that micro-benchmarks are the right way to approach Java performance issues. Quote from http://code.google.com/p/caliper/wiki/JavaMicrobenchmarks "Why would I ever write a microbenchmark then? Most of the time, you shouldn't! Instead, slavishly follow a principle of simple, clear coding that avoids clever optimizations. This is the type of code that JITs of the present and future are most likely to know how to optimize themselves. And that's a job which truly should be theirs, not yours. " Tools to do microbenchmarks: http://code.google.com/p/caliper/ (from the team that also does Guava, the Google Java library, recommended by Joshua Bloch himself) http://hype-free.blogspot.com/2010/01/choosing-java-profiler.html http://www.infoq.com/articles/java-profiling-with-open-source http://java.net/projects/japex Joshua Bloch on Performance Anxiety: http://java.dzone.com/articles/joshua-bloch-performance (follow link to parleys) |
87737 | No Perforce job exists for this issue. | 1 | 33310 | 8 years, 15 weeks ago |
Reviewed
|
0|i062c7: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1231 | ZOOKEEPER-1198 refactor int constants in o.a.z.s.q.Leader to enum |
Sub-task | Open | Major | Unresolved | Thomas Koch | Thomas Koch | Thomas Koch | 17/Oct/11 03:42 | 18/Oct/11 14:30 | 0 | 0 | There are a couple of magic number in Leader, representing QuorumPackage types, like DIFF, TRUNC, SNAP, OBSERVERINFO, NEWLEADER, FOLLOWERINFO... These should rather be made an enum. | 87704 | No Perforce job exists for this issue. | 0 | 42024 | 8 years, 23 weeks, 2 days ago | 0|i07k3r: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1230 | ZOOKEEPER-1198 Cleanup FileTxnLog |
Sub-task | Open | Major | Unresolved | Unassigned | Thomas Koch | Thomas Koch | 15/Oct/11 13:15 | 01/May/13 22:29 | 0 | 0 | ZOOKEEPER-1285 | - remove Interface TxnLog. The discussion on the mailing list (subject: "Get rid of unnecessary Interfaces") didn't give a definite No...? - make things private where possible - does preAllocSize need to be static and therefor global? - the append method has one big if statement from begin to end. make this a fast return - new private method to initialize a new logStream if logSTream == null - move the check for a faulty transaction in the method o.a.z.s.persistence.Util.marshallTxnEntry - mashallTxnEntry is only ever used from the append method of FileTxnLog. However I've seen the same code somewhere else... - new private method that returns a checksum for a given bytebuffer and length |
86712 | No Perforce job exists for this issue. | 1 | 42025 | 8 years, 12 weeks, 4 days ago | 0|i07k3z: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1229 | C client hashtable_remove redundantly calls hash function |
Improvement | Resolved | Trivial | Fixed | Harsh J | Eric Abbott | Eric Abbott | 15/Oct/11 04:05 | 31/Dec/11 05:57 | 30/Dec/11 16:03 | 3.3.3 | 3.5.0 | c client | 0 | 0 | hashtable_remove appears to call the hash function in consecutive lines. As hash functions are generally cpu intensive, using the hashvalue returned from the first call will result in a performance improvement. {noformat} void * /* returns value associated with key */ hashtable_remove(struct hashtable *h, void *k) ... unsigned int hashvalue, index; hashvalue = hash(h,k); index = indexFor(h->tablelength,hash(h,k)); pE = &(h->table[index]); e = *pE; {noformat} |
newbie | 86663 | No Perforce job exists for this issue. | 1 | 33311 | 8 years, 12 weeks, 5 days ago |
Reviewed
|
0|i062cf: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1228 | ZOOKEEPER-1198 Cleanup SessionTracker |
Sub-task | Open | Major | Unresolved | Unassigned | Thomas Koch | Thomas Koch | 14/Oct/11 11:10 | 01/May/13 22:29 | 0 | 1 | ZOOKEEPER-1285 | - fix ordering of class members - Remove Interface Session and rename inner class SessionImpl to Session - make properties private final where possible - rename SessionTrackerImpl to LeaderSessionTracker. There's a LearnerSessionTracker, so it makes sense. - make the following code clearer, what does the bitshifting do? {code} public static long initializeNextSession(long id) { long nextSid = 0; nextSid = (System.currentTimeMillis() << 24) >> 8; nextSid = nextSid | (id <<56); return nextSid; } {code} - replace the inner class SessionSet by a normal Set - make SessionTrackerImpl an instance of Runnable |
85600 | No Perforce job exists for this issue. | 0 | 42026 | 8 years, 23 weeks, 6 days ago | 0|i07k47: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1227 | ZOOKEEPER-1263 Zookeeper logs is showing -1 as min/max session timeout if there is no sessiontimeout value configured |
Sub-task | Resolved | Minor | Fixed | Rakesh Radhakrishnan | Rakesh Radhakrishnan | Rakesh Radhakrishnan | 14/Oct/11 10:04 | 25/Mar/14 17:15 | 25/Mar/14 17:15 | 3.3.3 | 3.5.0 | server | 0 | 1 | ZOOKEEPER-1213 | When starting the ZooKeeper without configuring 'minimumSessionTimeOut' and 'maximumSessionTimeOut'. I'm seeing the '-1' as the lower and the upper bound, instead it should give the default values : tickTime*2 and tickTime*20 {noformat} 2011-10-14 13:07:18,761 - INFO [main:QuorumPeerConfig@92] - Reading configuration from: /home/amith/CI/source/install/zookeeper/zookeeper1/bin/../conf/zoo.cfg 2011-10-14 13:07:19,118 - INFO [main:QuorumPeer@834] - tickTime set to 2000 2011-10-14 13:07:19,119 - INFO [main:QuorumPeer@845] - minSessionTimeout set to -1 2011-10-14 13:07:19,119 - INFO [main:QuorumPeer@856] - maxSessionTimeout set to -1 {noformat} *Suggestion* Move the defaulting logic to the QuorumPeerConfig instead of doing in the QuorumPeer |
85590 | No Perforce job exists for this issue. | 1 | 42027 | 6 years, 2 days ago | 0|i07k4f: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1226 | ZOOKEEPER-1198 extract version check in separate method in PrepRequestProcessor |
Sub-task | Resolved | Major | Fixed | Thomas Koch | Thomas Koch | Thomas Koch | 14/Oct/11 07:23 | 18/Oct/11 07:13 | 18/Oct/11 07:13 | 0 | 1 | The following code is repeated 4 times and should be put in a method that either throws the Exception or returns the incremented version (see below). {code} version = setDataRequest.getVersion(); int currentVersion = nodeRecord.stat.getVersion(); if (version != -1 && version != currentVersion) { throw new KeeperException.BadVersionException(path); } version = currentVersion + 1; {code} {code} private static int checkAndIncVersion(int currentVersion, int versionToCompare, String path ) {code} |
85562 | No Perforce job exists for this issue. | 1 | 33312 | 8 years, 23 weeks, 2 days ago | 0|i062cn: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1225 | Successive invocation of LeaderElectionSupport.start() will bring the ELECTED node to READY and cause no one in ELECTED state. |
Bug | Patch Available | Major | Unresolved | Rakesh Radhakrishnan | Rakesh Radhakrishnan | Rakesh Radhakrishnan | 13/Oct/11 06:36 | 05/Feb/20 07:12 | 3.3.3 | 3.7.0, 3.5.8 | recipes | 0 | 2 | Presently there is no state validation for the start() api, so one can invoke multiple times consecutively. The second or further invocation will makes the client node to become 'READY' state transition. Because there is an offer already got created during the first invocation of the start() api, the second invocation again makeOffer() and after determination will be chosen as READY state transitions. This makes the situation with no 'ELECTED' nodes present and the client (or the user of the election recipe) will be indefinitely waiting for the 'ELECTED' node. Similarly, stop() api can be invoked and there is no state validation and this can dispatch unnecessary FAILED transition events. IMO, LES recipe can have validation logic to avoid the successive start() and stop() invocations. |
85331 | No Perforce job exists for this issue. | 1 | 2558 | 3 years, 39 weeks, 2 days ago | 0|i00sjz: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1224 | problem across zookeeper clients when reading data written by other clients |
Bug | Resolved | Minor | Not A Problem | Laxman | amith | amith | 13/Oct/11 00:54 | 18/Oct/11 04:09 | 18/Oct/11 04:09 | 3.3.0 | 3.5.0 | java client | 0 | 2 | 2419200 | 2419200 | 0% | Zookeeper console client (i.e, zkCli.sh ) and ZkClient with 3 zookeeper quorum |
create a java client create a persistent node using that client write data into the node like.. ZkClient zk = new ZkClient ( getZKServers () ); zk.createPersistent ( "/amith" , true ); zk.writeData ( "/amith", "amith" ); Object readData = zk.readData ( "/amith" ); LOGGER.logInfo (readData); zk.delete ( "/amith" ); and try to read the same using ZkCli.sh console client [zk: XXX.XXX.XXX.XXX:XXXXX(CONNECTED) 2] get /amith ��tamith cZxid = 0x100000004 ctime = Wed Oct 12 10:13:15 CST 2011 mZxid = 0x100000005 mtime = Wed Oct 12 10:13:15 CST 2011 pZxid = 0x100000004 cversion = 0 dataVersion = 1 aclVersion = 0 ephemeralOwner = 0x0 dataLength = 12 numChildren = 0 data is displayed as ��tamith this include some unwanted char |
0% | 0% | 2419200 | 2419200 | 85289 | No Perforce job exists for this issue. | 0 | 32657 | 8 years, 23 weeks, 2 days ago | 0|i05yb3: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1223 | C recipes includes <zookeeper.h> instead of <zookeeper/zookeeper.h> |
Bug | Open | Trivial | Unresolved | Unassigned | June Fang | June Fang | 12/Oct/11 23:51 | 29/Dec/11 11:17 | 3.3.3 | recipes | 0 | 1 | 7200 | 7200 | 0% | CentOS 5 | according to ZOOKEEPER-1033, headers will be installed into "PREFIX/zookeeper" directory. i guess theses includes may also needed to be changed ? |
0% | 0% | 7200 | 7200 | 85284 | No Perforce job exists for this issue. | 0 | 32658 | 8 years, 13 weeks ago | 0|i05ybb: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1222 | getACL should only call DataTree.copyStat when passed in stat is not null |
Bug | Resolved | Minor | Fixed | Michi Mutsuzaki | Camille Fournier | Camille Fournier | 12/Oct/11 17:08 | 08/Jul/14 17:17 | 08/Jul/14 14:33 | 3.3.3, 3.4.0 | 3.4.7, 3.5.0 | java client | 0 | 5 | getACL(String, Stat) should allow the stat object to be null in the case that the user doesn't care about getting the stat back, as per other methods with similar syntax | 84819 | No Perforce job exists for this issue. | 3 | 32659 | 5 years, 37 weeks, 2 days ago | 0|i05ybj: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1221 | ZOOKEEPER-1198 Provide accessors for Request.{hdr|txn} |
Sub-task | Resolved | Minor | Fixed | Thomas Koch | Thomas Koch | Thomas Koch | 12/Oct/11 09:20 | 21/Oct/11 06:55 | 20/Oct/11 14:03 | 3.5.0 | 0 | 0 | I'm working on a larger patch that makes the Request class immutable. To see, where the hdr and txn fields are modified, it helped to introduce accessor methods. The JVM should happily inline the method calls so no performance overhead should be expected. There's a minor, unrelated change included: ToBeAppliedRequestProcessor had a reference to the toBeApplied list of the Leader. So it was hard to find all places, where this list was actually modified. The patch gives instead the leader instance to the toBeAppliedRequestProcessor and the processor then accesses leader.toBeApplied. |
74143 | No Perforce job exists for this issue. | 3 | 33313 | 8 years, 22 weeks, 6 days ago |
Reviewed
|
0|i062cv: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1220 | ./zkCli.sh 'create' command is throwing ArrayIndexOutOfBoundsException |
Bug | Resolved | Major | Fixed | kavita sharma | kavita sharma | kavita sharma | 12/Oct/11 06:47 | 15/Dec/11 06:58 | 14/Dec/11 17:59 | 3.3.3 | 3.5.0 | scripts | 0 | 4 | Few problems while executing create command, If we will give command like 1)[zk: localhost:2181(CONNECTED) 0] create -s -e /node1 {noformat} Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 4 at org.apache.zookeeper.ZooKeeperMain.processZKCmd(ZooKeeperMain.java:692) at org.apache.zookeeper.ZooKeeperMain.processCmd(ZooKeeperMain.java:593) at org.apache.zookeeper.ZooKeeperMain.executeLine(ZooKeeperMain.java:365) at org.apache.zookeeper.ZooKeeperMain.run(ZooKeeperMain.java:323) at org.apache.zookeeper.ZooKeeperMain.main(ZooKeeperMain.java:282) {noformat} but actually it should create emphemeral sequential node. 2)[zk: localhost:2181(CONNECTED) 0] create -s -e {noformat} Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 3 {noformat} here it should print the list of commands that is the default behaviour of zkCli for invalid/incomplete commands. 3)[zk: localhost:2181(CONNECTED) 3] create -s -e "data" {noformat} Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 4 {noformat} here command is wrong so it should print list of commnads. . 4)[zk: localhost:2181(CONNECTED) 0] create /node1 zkCli is treating it as a invalid command.because for args.length check (3)is their but behaviour is if user haven't given any of the option it should create persistent node. {noformat} if (cmd.equals("create") && args.length >= 3) { int first = 0; CreateMode flags = CreateMode.PERSISTENT; {noformat} |
73885 | No Perforce job exists for this issue. | 4 | 32660 | 8 years, 15 weeks ago |
Reviewed
|
0|i05ybr: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1219 | LeaderElectionSupport recipe is unnecessarily dispatching the READY_START event even if the ELECTED node stopped/expired simultaneously. |
Improvement | Resolved | Major | Fixed | Rakesh Radhakrishnan | Rakesh Radhakrishnan | Rakesh Radhakrishnan | 11/Oct/11 08:29 | 30/Mar/14 03:07 | 29/Mar/14 19:02 | 3.3.3 | 3.5.0 | recipes | 0 | 5 | Let's say node has determined as READY and has dispatched DETERMINE_COMPLETE event, at the same time the ELECTED node got stopped or expired . Still the f/w first dispatches the READY_START event to the node and then checks whether the ELECTED node exists() or not. Here it finds there is no 'Stat' corresponding to ELECTED and will again goes to leader determination phase. *Problem:* Unnecessarily the READY_START event is dispatching to the node and says node to be ready with the startup/init, even if there is no ELECTED node. *Proposal* Reverse the logic, first check whether ELECTED node exists() or not and then if success f/w can dispatch the READY_START event. Otherwise go to the leader determination phase. |
59118 | No Perforce job exists for this issue. | 1 | 2556 | 5 years, 51 weeks, 4 days ago | 0|i00sjj: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1218 | zktreeutil tool enhancement |
Improvement | Open | Major | Unresolved | Anirban Roy | Anirban Roy | Anirban Roy | 08/Oct/11 04:57 | 14/Dec/19 06:08 | 3.4.0 | 3.7.0 | contrib | 14/Oct/11 | 2 | 3 | 604800 | 0 | 604800 | 100% | GNU/Linux i386/i686/x64_84 | ============================================ zktreeutil - Zookeeper Tree Data Utility Author: Anirban Roy (r_anirban at yahoo.com) Organization: Yahoo Inc. ============================================ zktreeutil program is intended to manage and manipulate zk-tree data quickly, efficiently and with ease. The utility operates on free-form ZK-tree and hence can be used for any cluster managed by Zookeeper. Here are the basic functionalities - EXPORT: The whole/partial ZK-tree is exported into a XML file. This helps in capturing a current snapshot of the data for backup/analysis. For a subtree export, one need to specify the path to the ZK-subtree with proper option. Since Zookeeper store binary data against znode, the data dumped on xml file is based64 encoded with an attribute "encode=true". Optionally one may specify not to encode data with --noencode option if data stored on zookeeper is guaranteed to be text data. IMPORT: The ZK (sub)tree can be imported from XML into ZK cluster. This helps in priming the new ZK cluster with static configuration. The import can be non-intrusive by making only additions and modifications in the existing data. One may optionally delete existing (sub)tree before importing the new data with --force option. The znodes which carries an attribute "encode=true" will be decoded and written to zookeeper. DIFF: Creates a diff between live ZK data vs data saved in XML file. Diff can ignore some ZK-tree branches (possibly dynamic data) on reading the optional ignore flag from XML file. Taking diff on a ZK-subtree achieved by providing path to ZK-subtree with diff command. UPDATE: Make the incremental changes into the live ZK-tree from saved XML, essentially after running the diff. DUMP: Dumps the ZK (sub)tree on the standard output device reading either from live ZK server or XML file. The exported ZK data into XML file can be shortened by only keeping the static ZK nodes which are required to prime an application. The dynamic zk nodes (created on-the-fly) can be ignored by setting a 'ignore' attribute at the root node of the dynamic subtree (see tests/zk_sample.xml), possibly deleting all inner ZK nodes under that. Once ignored, the whole subtree is ignored during DIFF, UPDATE and WRITE. Pre-requisites -------------- 1. Linux system with 2.6.X kernel. 2. Zookeeper C client library (locally built at ../../c/.libs) >= 3.X.X 3. Development build libraries (rpm packages): a. boost-devel >= 1.32.0 b. libxml2-devel >= 2.6.26 c. log4cxx-devel >= 0.9.7-7 d. openssl-devel >= 0.9.7a e. cppunit >= 1.12.0-2 Build instructions ------------------ 1. cd into this directory 2. autoreconf -if 3. ./configure # Configure the build env 4. make # Build the tool 5. make check # Run unit-tests 6. ./src/zktreeutil --help # Usage help Testing and usage of zktreeutil -------------------------------- 1. Run Zookeeper server locally on port 2181 2. export LD_LIBRARY_PATH=../../c/.libs/:/usr/local/lib/ 3. ./src/zktreeutil --help # show help 4. ./src/zktreeutil --zookeeper=localhost:2181 --import --xmlfile=tests/zkdata_test.xml 2>/dev/null # import sample ZK tree 5. ./src/zktreeutil --zookeeper=localhost:2181 --dump --path=/myapp/version-1.0 2>/dev/null # dump Zk subtree 5. ./src/zktreeutil --zookeeper=localhost:2181 --dump --depth=3 2>/dev/null # dump Zk tree till certain depth 6. ./src/zktreeutil --xmlfile=zkdata_test.xml -D 2>/dev/null # dump the xml data 7. Change zkdata_test.xml with adding/deleting/chaging some nodes 8. ./src/zktreeutil -z localhost:2181 -F -x zkdata_test.xml -p /myapp/version-1.0/configuration 2>/dev/null # take a diff of changes 9. ./src/zktreeutil -z localhost:2181 -E --noencode 2>/dev/null > zk_sample2.xml # export the mofied ZK tree 10. ./src/zktreeutil -z localhost:2181 -U -x zkdata_test.xml -p /myapp/version-1.0/distributions 2>/dev/null # update with incr. changes 11. ./src/zktreeutil --zookeeper=localhost:2181 --import --force --xmlfile=zk_sample2.xml 2>/dev/null # re-prime the ZK tree For more details of usage, please see the unit tests. Hope this helps. Please reach out to me for any bugs, comments or suggestions. |
100% | 100% | 604800 | 0 | 604800 | patch | 50570 | No Perforce job exists for this issue. | 1 | 2509 | 5 years, 51 weeks, 3 days ago | 1. Export/import capability of binary data 2. Null data handing 3. Export/import/dump/diff capability of subtree 4. Efficient subtree handling 5. Improved logging 6. Improved testability with unittests 7. Option to dump/export on file 8. Fix to handle new state introduced in ZOOKEEPER-1108 |
zktreeutil | 0|i00s93: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1217 | ZOOKEEPER-1198 Remove unnecessary MissingSessionException in ZooKeeperServer |
Sub-task | Patch Available | Minor | Unresolved | Thomas Koch | Thomas Koch | Thomas Koch | 07/Oct/11 10:59 | 29/Oct/11 06:39 | 0 | 1 | MissingSessionException in only thrown and catched once inside this class and can as well be replaced by a boolean return value. While I'm at it: The method throwing this Exception makes more sense to be inlined in the one place from where it is called. |
50304 | No Perforce job exists for this issue. | 5 | 42028 | 8 years, 21 weeks, 5 days ago | 0|i07k4n: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1216 | ZOOKEEPER-1198 Fix more eclipse compiler warnings, also in Tests |
Sub-task | Resolved | Minor | Fixed | Thomas Koch | Thomas Koch | Thomas Koch | 07/Oct/11 10:06 | 25/Oct/11 06:56 | 25/Oct/11 02:08 | 3.5.0 | 0 | 1 | I did set up a new work environment for a presentation of Eclipse+EGit+Gerrit+Jenkins and found more warnings that were ignored on my machine. Warnings are now down to 5! So no excuses to introduce new ones! Fixed warnings: - removed unused imports - removed unused variables / methods - added missing generics - added ignore warnings for calls to deprecated code in tests |
50193 | No Perforce job exists for this issue. | 4 | 33314 | 8 years, 22 weeks, 2 days ago |
Reviewed
|
0|i062d3: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1215 | C client persisted cache |
New Feature | Open | Major | Unresolved | Marc Celani | Marc Celani | Marc Celani | 06/Oct/11 17:19 | 21/Dec/11 12:05 | c client | 1 | 3 | Motivation: 1. Reduce the impact of client restarts on zookeeper by implementing a persisted cache, and only fetching deltas on restart 2. Reduce unnecessary calls to zookeeper. 3. Improve performance of gets by caching on the client 4. Allow for larger caches than in memory caches. Behavior Change: Zookeeper clients will not have the option to specify a folder path where it can cache zookeeper gets. If they do choose to cache results, the zookeeper library will check the persisted cache before actually sending a request to zookeeper. Watches will automatically be placed on all gets in order to invalidate the cache. Alternatively, we can add a cache flag to the get API - thoughts? On reconnect or restart, zookeeper clients will check the version number of each entries into its persisted cache, and will invalidate any old entries. In checking version number, zookeeper clients will also place a watch on those files. In regards to watches, client watch handlers will not fire until the invalidation step is completed, which may slow down client watch handling. Since setting up watches on all files is necessary on initialization, initialization will likely slow down as well. API Change: The zookeeper library will expose a new init interface that specifies a folder path to the cache. A new get API will specify whether or not to use cache, and whether or not stale data is safe to return if the connection is down. Design: The zookeeper handler structure will now include a cache_root_path (possibly null) string to cache all gets, as well as a bool for whether or not it is okay to serve stale data. Old API calls will default to a null path (which signifies no cache), and signify that it is not okay to serve stale data. The cache will be located at a cache_root_path. All files will be placed at cache_root_path/file_path. The cache will be an incomplete copy of everything that is in zookeeper, but everything in the cache will have the same relative path from the cache_root_path that it has as a path in zookeeper. Each file in the cache will include the Statstructure and the file contents. zoo_get will check the zookeeper handler to determine whether or not it has a cache. If it does, it will first go to the path to the persisted cache and append the get path. If the file exists and it is not invalidated, the zookeeper client will read it and return its value. If the file does not exist or is invalidated, the zookeeper library will perform the same get as is currently designed. After getting the results, the library will place the value in the persisted cache for subsequent reads. zoo_set will automatically invalidate the path in the cache. If caching is requested, then on each zoo_get that goes through to zookeeper, a watch will be placed on the path. A cache watch handler will handle all watch events by invalidating the cache, and placing another watch on it. Client watch handlers will handle the watch event after the cache watch handler. The cache watch handler will not call zoo_get, because it is assumed that the client watch handlers will call zoo_get if they need the fresh data as soon as it is invalidated (which is why the cache watch handler must be executed first). All updates to the cache will be done on a separate thread, but will be queued in order to maintain consistency in the cache. In addition, all client watch handlers will not be fired until the cache watch handler completes its invalidation write in order to ensure that client calls to zoo_get in the watch event handler are done after the invalidation step. This means that a client watch handler could be waiting on SEVERAL writes before it can be fired off, since all writes are queued. When a new connection is made, if a zookeeper handler has a cache, then that cache will be scanned in order to find all leaf nodes. Calls will be made to zookeeper to check if all of these nodes still exist, and if they do, what their version number is. Any inconsistencies in version will result in the cache invalidating the out of date files. Any files that no longer exist will be deleted from the cache. If a connection fails, and a zoo_get call is made on a zookeeper handler that has a cache associated with it, and that cache tolerates stale data, then the stale data will be returned from cache - otherwise, all zoo_gets will error out as they do today. |
49786 | No Perforce job exists for this issue. | 0 | 42029 | 8 years, 20 weeks, 1 day ago | 0|i07k4v: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1214 | QuorumPeer should unregister only its previsously registered MBeans instead of use MBeanRegistry.unregisterAll() method. |
Bug | Resolved | Major | Fixed | César Álvarez Núñez | César Álvarez Núñez | César Álvarez Núñez | 05/Oct/11 05:56 | 20/May/14 07:09 | 17/May/14 23:49 | 3.5.0 | quorum | 0 | 6 | When a QuorumPeer thread dies, it is unregistering *all* ZKMBeanInfo MBeans previously registered on its java process; including those that has not been registered by itself. It does not cause any side effect in production environment where each server is running on a separate java process; but fails when using "org.apache.zookeeper.test.QuorumUtil" to programmatically start up a zookeeper server ensemble and use its provided methods to force Disconnected, SyncConnected or SessionExpired events; in order to perform some basic/functional testing. Scenario: * QuorumUtil qU = new QuorumUtil(1); // It creates a 3 servers ensemble. * qU.startAll(); // Startup all servers: 1 Leader + 2 Followers * qU.shutdown\(i\); // i is a number from 1 to 3. It shutdown one server. The last method causes that a QuorumPeer will die, invoking the MBeanRegistry.unregisterAll() method. As a result, *all* ZKMBeanInfo MBeans are unregistered; including those belonging to the other QuorumPeer instances. When trying to restart previous server (qU.restart\(i\)) an AssertionError is thrown at MBeanRegistry.register(ZKMBeanInfo bean, ZKMBeanInfo parent) method, causing the QuorumPeer thread dead. To solve it: * MBeanRegistry.unregisterAll() method has been removed. * QuorumPeer only unregister its ZKMBeanInfo MBeans. |
46382 | No Perforce job exists for this issue. | 6 | 32661 | 5 years, 44 weeks, 2 days ago | 0|i05ybz: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1213 | ZOOKEEPER-1263 ZooKeeper server startup fails if configured only with the 'minSessionTimeout' and not 'maxSessionTimeout' |
Sub-task | Resolved | Major | Fixed | Rakesh Radhakrishnan | Rakesh Radhakrishnan | Rakesh Radhakrishnan | 04/Oct/11 10:02 | 25/Mar/14 17:15 | 25/Mar/14 17:15 | 3.3.3 | 3.5.0 | server | 0 | 1 | ZOOKEEPER-1227 | I have configured only the 'minSessionTimeout' and not configured 'maxSessionTimeout' in the zoo.cfg file as follows +zoo.cfg+ tickTime=2000 minSessionTimeout=10000 I'm seeing the following exception and not starting the ZooKeeper server {noformat} 2011-10-07 23:39:10,546 - INFO [main:QuorumPeerConfig@100] - Reading configuration from: /home/rakeshr/zookeeper/bin/../conf/zoo.cfg 2011-10-07 23:39:12,334 - ERROR [main:QuorumPeerMain@85] - Invalid config, exiting abnormally org.apache.zookeeper.server.quorum.QuorumPeerConfig$ConfigException: Error processing /home/rakeshr/zookeeper/bin/../conf/zoo.cfg at org.apache.zookeeper.server.quorum.QuorumPeerConfig.parse(QuorumPeerConfig.java:120) at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:101) at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78) Caused by: java.lang.IllegalArgumentException: minSessionTimeout must not be larger than maxSessionTimeout at org.apache.zookeeper.server.quorum.QuorumPeerConfig.parseProperties(QuorumPeerConfig.java:265) at org.apache.zookeeper.server.quorum.QuorumPeerConfig.parse(QuorumPeerConfig.java:116) ... 2 more {noformat} Startup fails due to the following validation. Here maxSessionTimeout value is -1 rather than the upper limit (tickTime * 2) {noformat} /** defaults to -1 if not set explicitly */ protected int maxSessionTimeout = -1; if (minSessionTimeout > maxSessionTimeout) { throw new IllegalArgumentException( "minSessionTimeout must not be larger than maxSessionTimeout"); } {noformat} |
44362 | No Perforce job exists for this issue. | 1 | 42030 | 6 years, 2 days ago | 0|i07k53: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1212 | zkServer.sh stop action is not conformat with LSB para 20.2 Init Script Actions |
Bug | Closed | Major | Fixed | Roman Shaposhnik | Roman Shaposhnik | Roman Shaposhnik | 03/Oct/11 16:24 | 23/Nov/11 14:22 | 19/Oct/11 02:42 | 3.3.3, 3.4.0, 3.5.0 | 3.3.4, 3.4.0, 3.5.0 | scripts | 0 | 1 | According to LSB Core para 20.2: ================================================================================== Otherwise, the exit status shall be nonzero, as defined below. In addition to straightforward success, the following situations are also to be considered successful: • restarting a service (instead of reloading it) with the forcereload argument • running start on a service already running • running stop on a service already stopped or not running • running restart on a service already stopped or not running • running tryrestart on a service already stopped or not running ================================================================================== Yet, zkServer.sh fails on stop if it can't find a PID file: {noformat} stop) echo -n "Stopping zookeeper ... " if [ ! -f "$ZOOPIDFILE" ] then echo "error: could not find file $ZOOPIDFILE" exit 1 else $KILL -9 $(cat "$ZOOPIDFILE") rm "$ZOOPIDFILE" echo STOPPED exit 0 fi {noformat} |
43879 | No Perforce job exists for this issue. | 2 | 30001 | 8 years, 23 weeks, 1 day ago | 0|i05hxb: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1211 | C client's package name |
Bug | Resolved | Trivial | Duplicate | Unassigned | June Fang | June Fang | 30/Sep/11 05:07 | 12/Oct/11 22:28 | 12/Oct/11 22:28 | 3.3.3 | c client | 0 | 0 | 3600 | 3600 | 0% | centos 5 | the package name of c client is "c-client-src", which lead the include file to be installed to /usr/local/include/c-client-src. it's a bit annoying since user need to manual rename it to zookeeper. i think there are two fix, 1) change autoconf package name to "zookeeper", then the header will be installed to zookeeper subdir, which is consistent with the README; 2) change pkginclude_HEADER to include_HEADER, which will install headers to /usr/local/include. |
0% | 0% | 3600 | 3600 | 40983 | No Perforce job exists for this issue. | 0 | 32662 | 8 years, 24 weeks ago | 0|i05yc7: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1210 | Can't build ZooKeeper RPM with RPM >= 4.6.0 (i.e. on RHEL 6 and Fedora >= 10) |
Bug | Resolved | Minor | Fixed | Tadeusz Andrzej Kadłubowski | Tadeusz Andrzej Kadłubowski | Tadeusz Andrzej Kadłubowski | 28/Sep/11 10:20 | 30/Jun/12 07:01 | 30/Jun/12 02:19 | 3.4.0 | 3.3.6, 3.4.4 | build | 0 | 6 | Tested to fail on both Centos 6.0 and Fedora 14 | I was trying to build the zookeeper RPM (basically, `ant rpm -Dskip.contrib=1`), using build scripts that were recently merged from the work on the ZOOKEEPER-999 issue. The final stage, i.e. running rpmbuild failed. From what I understand it mixed BUILD and BUILDROOT subdirectories in /tmp/zookeeper_package_build_tkadlubo/, leaving BUILDROOT empty, and placing everything in BUILD. The full build log is at http://pastebin.com/0ZvUAKJt (Caution: I cut out long file listings from running tar -xvvf). |
patch | 36676 | No Perforce job exists for this issue. | 3 | 32663 | 7 years, 38 weeks, 5 days ago | Fix buildroot misplacement on systems with RPM>=4.6. Earlier RPM versions support --buildroot commandline flag, so this doesn't break anything on older systems. |
Reviewed
|
rpm ant | 0|i05ycf: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1209 | LeaderElection recipe doesn't handle the split-brain issue, n/w disconnection can bring both the client nodes to be in ELECTED |
Bug | Patch Available | Major | Unresolved | Rakesh Radhakrishnan | Rakesh Radhakrishnan | Rakesh Radhakrishnan | 28/Sep/11 02:53 | 05/Feb/20 07:11 | 3.3.3 | 3.7.0, 3.5.8 | recipes | 0 | 5 | *Case1-* N/w disconnection can bring both the client nodes to be in ELECTED state. Current LeaderElectionSupport(LES) f/w handles only 'NodeDeletion'. Consider the scenario where ELECTED and READY nodes are running. Say ELECTED node's n/w got failed and is "Disconnected" from ZooKeeper. But it will behave as ELECTED as it is not getting any events from the LeaderElectionSupport(LES) framework. After sessiontimeout, node in READY state will be notified by 'NodeDeleted' event and will go to ELECTED state. *Problem:* Both the node becomes ELECTED and finally the user sees two Master (ELECTED) node and cause inconsistencies. *Case2-* Also in this case, Let's say if user has started only one client node and becomes ELECTED. After sometime n/w has disconnected with the ZooKeeper server and the session got expired. *Problem:* Still the client node will be in the ELECTED state. After sometime if user has started the second client node. Again both the nodes becomes ELECTED. |
34398 | No Perforce job exists for this issue. | 1 | 2557 | 3 years, 39 weeks, 2 days ago | 0|i00sjr: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1208 | Ephemeral node not removed after the client session is long gone |
Bug | Closed | Blocker | Fixed | Patrick D. Hunt | kishore gopalakrishna | kishore gopalakrishna | 28/Sep/11 00:35 | 23/Nov/11 14:22 | 14/Nov/11 14:34 | 3.3.3 | 3.3.4, 3.4.0, 3.5.0 | 3 | 12 | Copying from email thread. We found our ZK server in a state where an ephemeral node still exists after a client session is long gone. I used the cons command on each ZK host to list all connections and couldn't find the ephemeralOwner id. We are using ZK 3.3.3. Has anyone seen this problem? I got the following information from the logs. The node that still exists is /kafka-tracking/consumers/UserPerformanceEvent-<host>/owners/UserPerformanceEvent/529-7 I saw that the ephemeral owner is 86167322861045079 which is session id 0x13220b93e610550. After searching in the transaction log of one of the ZK servers found that session expired 9/22/11 12:17:57 PM PDT session 0x13220b93e610550 cxid 0x74 zxid 0x601bd36f7 closeSession null On digging further into the logs I found that there were multiple sessions created in quick succession and every session tried to create the same node. But i verified that the sessions were closed and opened in order 9/22/11 12:17:56 PM PDT session 0x13220b93e610550 cxid 0x0 zxid 0x601bd36b5 createSession 6000 9/22/11 12:17:57 PM PDT session 0x13220b93e610550 cxid 0x74 zxid 0x601bd36f7 closeSession null 9/22/11 12:17:58 PM PDT session 0x13220b93e610551 cxid 0x0 zxid 0x601bd36f8 createSession 6000 9/22/11 12:17:59 PM PDT session 0x13220b93e610551 cxid 0x74 zxid 0x601bd373a closeSession null 9/22/11 12:18:00 PM PDT session 0x13220b93e610552 cxid 0x0 zxid 0x601bd373e createSession 6000 9/22/11 12:18:01 PM PDT session 0x13220b93e610552 cxid 0x6c zxid 0x601bd37a0 closeSession null 9/22/11 12:18:02 PM PDT session 0x13220b93e610553 cxid 0x0 zxid 0x601bd37e9 createSession 6000 9/22/11 12:18:03 PM PDT session 0x13220b93e610553 cxid 0x74 zxid 0x601bd382b closeSession null 9/22/11 12:18:04 PM PDT session 0x13220b93e610554 cxid 0x0 zxid 0x601bd383c createSession 6000 9/22/11 12:18:05 PM PDT session 0x13220b93e610554 cxid 0x6a zxid 0x601bd388f closeSession null 9/22/11 12:18:06 PM PDT session 0x13220b93e610555 cxid 0x0 zxid 0x601bd3895 createSession 6000 9/22/11 12:18:07 PM PDT session 0x13220b93e610555 cxid 0x6a zxid 0x601bd38cd closeSession null 9/22/11 12:18:10 PM PDT session 0x13220b93e610556 cxid 0x0 zxid 0x601bd38d1 createSession 6000 9/22/11 12:18:11 PM PDT session 0x13220b93e610557 cxid 0x0 zxid 0x601bd38f2 createSession 6000 9/22/11 12:18:11 PM PDT session 0x13220b93e610557 cxid 0x51 zxid 0x601bd396a closeSession null Here is the log output for the sessions that tried creating the same node 9/22/11 12:17:54 PM PDT session 0x13220b93e61054f cxid 0x42 zxid 0x601bd366b create '/kafka-tracking/consumers/UserPerformanceEvent-<hostname>/owners/UserPerformanceEvent/529-7 9/22/11 12:17:56 PM PDT session 0x13220b93e610550 cxid 0x42 zxid 0x601bd36ce create '/kafka-tracking/consumers/UserPerformanceEvent-<hostname>/owners/UserPerformanceEvent/529-7 9/22/11 12:17:58 PM PDT session 0x13220b93e610551 cxid 0x42 zxid 0x601bd3711 create '/kafka-tracking/consumers/UserPerformanceEvent-<hostname>/owners/UserPerformanceEvent/529-7 9/22/11 12:18:00 PM PDT session 0x13220b93e610552 cxid 0x42 zxid 0x601bd3777 create '/kafka-tracking/consumers/UserPerformanceEvent-<hostname>/owners/UserPerformanceEvent/529-7 9/22/11 12:18:02 PM PDT session 0x13220b93e610553 cxid 0x42 zxid 0x601bd3802 create '/kafka-tracking/consumers/UserPerformanceEvent-<hostname>/owners/UserPerformanceEvent/529-7 9/22/11 12:18:05 PM PDT session 0x13220b93e610554 cxid 0x44 zxid 0x601bd385d create '/kafka-tracking/consumers/UserPerformanceEvent-<hostname>/owners/UserPerformanceEvent/529-7 9/22/11 12:18:07 PM PDT session 0x13220b93e610555 cxid 0x44 zxid 0x601bd38b0 create '/kafka-tracking/consumers/UserPerformanceEvent-<hostname>/owners/UserPerformanceEvent/529-7 9/22/11 12:18:11 PM PDT session 0x13220b93e610557 cxid 0x52 zxid 0x601bd396b create '/kafka-tracking/consumers/UserPerformanceEvent-<hostname>/owners/UserPerformanceEvent/529-7 Let me know if you need additional information. |
34379 | No Perforce job exists for this issue. | 4 | 32664 | 8 years, 19 weeks, 2 days ago | trunk version 1201832 | 0|i05ycn: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1207 | strange ReadOnlyZooKeeperServer ERROR when starting ensemble |
Bug | Resolved | Critical | Invalid | Unassigned | Patrick D. Hunt | Patrick D. Hunt | 27/Sep/11 16:04 | 25/Apr/14 15:28 | 25/Apr/14 15:28 | 3.5.0 | quorum, server | 0 | 1 | I'm seeing a strange ERROR message when starting an ensemble: {noformat} 2011-09-27 13:00:08,168 [myid:3] - ERROR [Thread-2:QuorumPeer$1@689] - FAILED to start ReadOnlyZooKeeperServer java.lang.InterruptedException: sleep interrupted at java.lang.Thread.sleep(Native Method) at org.apache.zookeeper.server.quorum.QuorumPeer$1.run(QuorumPeer.java:684) {noformat} I did not specify ReadOnlyZooKeeperServer, also why is this at ERROR level? I'm not sure the expected behavior here. Is r/o turned on by default? Seems we should have this as a config option, off by default. |
33620 | No Perforce job exists for this issue. | 0 | 32665 | 5 years, 47 weeks, 6 days ago | Thanks Rakesh. I'm closing this as invalid since this has been fixed by ZOOKEEPER-1268. | 0|i05ycv: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1206 | Sequential node creation does not use always use digits in node name given certain Locales. |
Bug | Closed | Minor | Fixed | Mark Miller | Mark Miller | Mark Miller | 27/Sep/11 12:26 | 23/Nov/11 14:22 | 29/Sep/11 17:36 | 3.3.3 | 3.3.4, 3.4.0, 3.5.0 | server | 0 | 1 | While I always expect to be able to parse a sequential node by looking for digits, under some locals you end up with non digits - for example: n_०००००००००० It looks like the problem is around line 236 in PrepRequestProcessor: {code} if (createMode.isSequential()) { path = path + String.format("%010d", parentCVersion); } {code} Instead we should pass Locale.ENGLISH to the format call. {code} if (createMode.isSequential()) { path = path + String.format(Locale.ENGLISH, "%010d", parentCVersion); } {code} Lucene/Solr tests with random Locales, and some of my tests that try and inspect the node name and order things expect to find digits - currently my leader election recipe randomly fails when the wrong locale pops up. |
19505 | No Perforce job exists for this issue. | 3 | 32666 | 8 years, 25 weeks, 6 days ago |
Reviewed
|
0|i05yd3: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1205 | Add a unit test for Kerberos Ticket-Granting Ticket (TGT) renewal |
Improvement | Open | Major | Unresolved | Unassigned | Eugene Joseph Koontz | Eugene Joseph Koontz | 27/Sep/11 01:04 | 05/Feb/20 07:16 | 3.7.0, 3.5.8 | tests | 0 | 1 | ZOOKEEPER-1174, ZOOKEEPER-1181, HADOOP-8078, HDFS-3016 | Create a unit test to test Kerberos ticket renewal. Note that testing Kerberos-related functionality in Java requires that a default kerberos configuration file be available. The location of this file can be set with the java.security.krb5.conf property (see http://download.oracle.com/javase/1.4.2/docs/guide/security/jgss/tutorials/KerberosReq.html ). For more background on Java and Kerberos, see http://download.oracle.com/javase/1,5.0/docs/guide/security/jgss/single-signon.html . For discussion about TGT renewal, see http://freeipa.org/page/Automatic_Ticket_Renewal . Mahadev Konar writes: "Mockito would be very helpful here." |
kerberos, security | 14952 | No Perforce job exists for this issue. | 0 | 42031 | 8 years, 4 weeks, 2 days ago | 0|i07k5b: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1204 | ZOOKEEPER-1198 Shorten calls to ZooTrace |
Sub-task | Open | Major | Unresolved | Thomas Koch | Thomas Koch | Thomas Koch | 26/Sep/11 11:46 | 27/Oct/11 18:03 | 0 | 2 | The calls to ZooTrace are kind of verbose and contain duplicated logic. This patch makes the calls as short as possible so that they do not distract that much from what's actually going on. Calls to LOG.isTraceEnabled() are removed at many places, because this check is done anyways inside ZooTracer. At some places it has been left, to avoid costly message creation. |
17 | No Perforce job exists for this issue. | 2 | 42032 | 8 years, 23 weeks ago | 0|i07k5j: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1203 | Zookeeper systest is missing Junit Classes |
Bug | Closed | Major | Fixed | Prashant Gokhale | Prashant Gokhale | Prashant Gokhale | 23/Sep/11 18:44 | 23/Nov/11 14:22 | 29/Sep/11 14:27 | 3.3.4, 3.4.0, 3.5.0 | tests | 0 | 1 | For running these tests, I am following instructions on https://github.com/apache/zookeeper/blob/trunk/src/java/systest/README.txt In Step 4, when I try to run java -jar build/contrib/fatjar/zookeeper-<version>-fatjar.jar systest org.apache.zookeeper.test.system.SimpleSysTest , it throws the following error, Exception in thread "main" java.lang.NoClassDefFoundError: junit/framework/TestCase The problem is that zookeeper-dev-fatjar.jar does not contain the TestCase class. Patrick Hunt suggested that adding <zipgroupfileset dir="${zk.root}/build/test/lib" includes="*.jar" /> to fatjar/build.xml should solve the problem and it does. |
18 | No Perforce job exists for this issue. | 1 | 32667 | 8 years, 26 weeks ago |
Reviewed
|
0|i05ydb: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1202 | Prevent certain state transitions in Java client on close(); improve exception handling and enhance client testability |
Improvement | Open | Major | Unresolved | Matthias Spycher | Matthias Spycher | Matthias Spycher | 22/Sep/11 21:56 | 14/Dec/19 06:07 | 3.4.0 | 3.7.0 | java client | 2 | 4 | ZOOKEEPER-126 | ZooKeeper.close() doesn't force the client into a CLOSED state. While the closing flag ensures that the client will close, its state may end up in CLOSED, CONNECTING or CONNECTED. I developed a patch and in the process cleaned up a few other things primarily to enable testing of state transitions. - ClientCnxnState is new and enforces certain state transitions - ZooKeeper.isExpired() is new - ClientCnxn no longer refers to ZooKeeper, WatchManager is externalized, and ClientWatchManager includes 3 new methods - The SendThread terminates the EventThread on a call to close() via the event-of-death - Polymorphism is used to handle internal exceptions (SendIOExceptions) - The patch incorporates ZOOKEEPER-126.patch and prevents close() from blocking |
19 | No Perforce job exists for this issue. | 1 | 2508 | 6 years, 1 day ago | Java client | 0|i00s8v: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1201 | ZOOKEEPER-1198 Clean SaslServerCallbackHandler.java |
Sub-task | Closed | Blocker | Fixed | Thomas Koch | Thomas Koch | Thomas Koch | 22/Sep/11 14:02 | 23/Nov/11 14:22 | 29/Sep/11 03:41 | 3.4.0, 3.5.0 | 0 | 1 | ZOOKEEPER-1195 | Severe code style issues. | 20 | No Perforce job exists for this issue. | 2 | 33315 | 8 years, 26 weeks ago |
Reviewed
|
0|i062db: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1200 | ZOOKEEPER-1198 Remove obsolete DataTreeBuilder |
Sub-task | Resolved | Major | Fixed | Thomas Koch | Thomas Koch | Thomas Koch | 22/Sep/11 09:34 | 28/Oct/11 06:55 | 27/Oct/11 17:24 | 3.5.0 | 0 | 1 | There's a DataTreeBuilder thing in the whole type hierarchy of ZooKeeperServer classes, which is never used. | 21 | No Perforce job exists for this issue. | 3 | 33316 | 8 years, 21 weeks, 6 days ago |
Reviewed
|
0|i062dj: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1199 | ZOOKEEPER-1198 Make OpCode an enum |
Sub-task | Open | Major | Unresolved | Thomas Koch | Thomas Koch | Thomas Koch | 22/Sep/11 05:04 | 27/Oct/11 17:12 | 0 | 1 | ZooDefs.OpCode is an interface with integer constants. Changing this to an enum provides safety. See "Item 30: Use enums instead of int constants" in Effective Java. | 22 | No Perforce job exists for this issue. | 6 | 42033 | 8 years, 22 weeks ago | 0|i07k5r: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1198 | Refactorings and Cleanups |
Improvement | Open | Major | Unresolved | Thomas Koch | Thomas Koch | Thomas Koch | 22/Sep/11 05:00 | 27/Oct/11 02:41 | 0 | 1 | ZOOKEEPER-1199, ZOOKEEPER-1200, ZOOKEEPER-1201, ZOOKEEPER-1204, ZOOKEEPER-1216, ZOOKEEPER-1217, ZOOKEEPER-1221, ZOOKEEPER-1226, ZOOKEEPER-1228, ZOOKEEPER-1230, ZOOKEEPER-1231, ZOOKEEPER-1233, ZOOKEEPER-1234, ZOOKEEPER-1235, ZOOKEEPER-1244, ZOOKEEPER-1245, ZOOKEEPER-1246, ZOOKEEPER-1247, ZOOKEEPER-1248, ZOOKEEPER-1250, ZOOKEEPER-1251, ZOOKEEPER-1252, ZOOKEEPER-1253, ZOOKEEPER-1255, ZOOKEEPER-1257, ZOOKEEPER-1258, ZOOKEEPER-1259, ZOOKEEPER-1266, ZOOKEEPER-1267, ZOOKEEPER-1276, ZOOKEEPER-1279, ZOOKEEPER-1284, ZOOKEEPER-1286, ZOOKEEPER-1288 | Umbrella issue for refactorings. I'll post individual refactoring steps as sub-issues. I'll also use this umbrella issue to submit previews of the full refactoring for testing by Jenkins or to ReviewBoard. | 2381 | No Perforce job exists for this issue. | 0 | 42034 | 8 years, 22 weeks, 2 days ago | cleanup, cleancode | 0|i07k5z: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1197 | Incorrect socket handling of 4 letter words for NIO |
Bug | Resolved | Critical | Won't Fix | Camille Fournier | Camille Fournier | Camille Fournier | 21/Sep/11 10:37 | 15/May/14 18:00 | 15/May/14 18:00 | 3.3.3, 3.4.0 | 3.5.0 | server | 0 | 3 | ZOOKEEPER-805, ZOOKEEPER-1346 | When transferring a large amount of information from a 4 letter word, especially in interactive mode (telnet or nc) over a slower network link, the connection can be closed before all of the data has reached the client. This is due to the way we handle nc non-interactive mode, by cancelling the selector key. Instead of cancelling the selector key for 4-letter-words, we should instead flag the NIOServerCnxn to ignore detection of a close condition on that socket (CancelledKeyException, EndOfStreamException). Since the 4lw will close the connection immediately upon completion, this should be safe to do. See ZOOKEEPER-737 for more details |
23 | No Perforce job exists for this issue. | 3 | 32668 | 6 years, 24 weeks, 1 day ago | We'll address the problem in ZOOKEEPER-1346 by moving the 4lws to a separate port. | 0|i05ydj: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1196 | improve Kerberos name parsing and canonicalization testing |
Improvement | Open | Major | Unresolved | Eugene Joseph Koontz | Eugene Joseph Koontz | Eugene Joseph Koontz | 20/Sep/11 13:28 | 24/Sep/11 10:32 | server, tests | 0 | 0 | ZOOKEEPER-1195 | Currently we are not testing Kerberos name parsing. Kerberos name parsing is error prone because Keberos principals are complex; see http://web.mit.edu/kerberos/krb5-1.5/krb5-1.5.4/doc/krb5-user/What-is-a-Kerberos-Principal_003f.html. Bugs such as https://issues.apache.org/jira/browse/ZOOKEEPER-1195 would have been caught, had we better tests. Although we cannot test (yet) a full end-to-end KDC realm, we can at least test Kerberos principal syntax and semantics. |
2382 | No Perforce job exists for this issue. | 1 | 42035 | 8 years, 26 weeks, 5 days ago | security | 0|i07k67: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1195 | SASL authorizedID being incorrectly set: should use getHostName() rather than getServiceName() |
Bug | Closed | Major | Fixed | Eugene Joseph Koontz | Eugene Joseph Koontz | Eugene Joseph Koontz | 20/Sep/11 11:15 | 01/May/13 22:29 | 29/Sep/11 03:42 | 3.4.0 | 3.4.0 | 0 | 2 | ZOOKEEPER-1201, ZOOKEEPER-938, ZOOKEEPER-1196 | Tom Klonikowski writes: Hello developers, the SaslServerCallbackHandler in trunk changes the principal name service/host@REALM to service/service@REALM (i guess unintentionally). lines 131-133: if (!removeHost() && (kerberosName.getHostName() != null)) { userName += "/" + kerberosName.getServiceName(); } Server Log: SaslServerCallbackHandler@115] - Successfully authenticated client: authenticationID=fetcher/ubook@QUINZOO; authorizationID=fetcher/ubook@QUINZOO. SaslServerCallbackHandler@137] - Setting authorizedID: fetcher/fetcher@QUINZOO |
24 | No Perforce job exists for this issue. | 2 | 32669 | 8 years, 26 weeks ago | One-line fix for bug identified by Tom Klonikowski | 0|i05ydr: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1194 | Two possible race conditions during leader establishment |
Bug | Closed | Major | Fixed | Alexander Shraer | Alexander Shraer | Alexander Shraer | 19/Sep/11 20:10 | 23/Nov/11 14:22 | 05/Nov/11 02:38 | 3.4.0, 3.5.0 | server | 0 | 1 | ZOOKEEPER-1270 | Leader.getEpochToPropose() and Leader.waitForNewEpoch() act as barriers - they make sure that a leader/follower can return from calling the method only once connectingFollowers (or electingFollowers) contain a quorum. But these methods don't make sure that the leader itself is in connectingFollowers/electingFollowers. So the leader didn't necessarily reach the barrier when followers pass it. This can cause the following problems: 1. If the leader is not in connectingFollowers when a LearnerHandler returns from getEpochToPropose(), then the epoch sent by the leader to the follower might be smaller than the leader's own last accepted epoch. 2. If the leader is not in electingFollowers when LearnerHandler returns from waitForNewEpoch() then the leader will send a NEWLEADER message to followers, and the followers will respond, but it is possible that the NEWLEADER message is not in outstandingProposals when these NEWLEADER acks arrive, which will cause the NEWLEADER acks to be dropped. To fix this I propose to explicitly check that the leader is in connectingFollowers/electingFollowers before anyone can pass these barriers. |
25 | No Perforce job exists for this issue. | 3 | 32670 | 8 years, 20 weeks, 5 days ago | 0|i05ydz: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1193 | Remove upgrade code |
Task | Resolved | Minor | Fixed | Thomas Koch | Thomas Koch | Thomas Koch | 19/Sep/11 08:58 | 03/Apr/15 02:32 | 20/Oct/11 12:49 | 3.5.0 | 0 | 1 | ZOOKEEPER-2157 | ZOOKEEPER-5 introduced the upgrade feature in october 2008. It may be time to think whether there are still installations in the wild that needs this upgrade feature. Otherwise the respective code can be removed. Even if there should be old installations, couldn't they just use some ZK 3.x version to upgrade and we could still remove the upgrade code from the trunk? |
2383 | No Perforce job exists for this issue. | 1 | 33317 | 8 years, 22 weeks, 6 days ago |
Reviewed
|
0|i062dr: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1192 | Leader.waitForEpochAck() checks waitingForNewEpoch instead of checking electionFinished |
Bug | Closed | Critical | Fixed | Alexander Shraer | Alexander Shraer | Alexander Shraer | 18/Sep/11 22:00 | 23/Nov/11 14:22 | 05/Nov/11 02:15 | 3.4.0, 3.5.0 | server | 0 | 2 | ZOOKEEPER-1191 | A follower/leader should block in Leader.waitForEpochAck() until either electingFollowers contains a quorum and electionFinished=true or until a timeout occurs. A timeout means that a quorum of followers didn't ack the epoch on time, which is an error. But the check in Leader.waitForEpochAck() is "if (waitingForNewEpoch) throw..." and this will never be triggered, even if the wait statement just timed out, because Leader.getEpochToPropose() completes and sets waitingForNewEpoch to false before Leader.waitForEpochAck() is invoked. Instead of "if (waitingForNewEpoch) throw" the condition in Leader.waitForEpochAck() should be "if (!electionFinished) throw". The guarded block introduced in ZK-1191 should be checking !electionFinished. |
26 | No Perforce job exists for this issue. | 3 | 32671 | 8 years, 20 weeks, 5 days ago | 0|i05ye7: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1191 | ZOOKEEPER-1192 Synchronization issue - wait not in guarded block |
Sub-task | Resolved | Minor | Fixed | Alexander Shraer | Alexander Shraer | Alexander Shraer | 17/Sep/11 15:53 | 13/Apr/14 22:10 | 13/Apr/14 22:10 | 3.4.0 | 3.5.0 | server | 0 | 0 | In Leader.java, getEpochToPropose() and waitForEpochAck() have the following code: if (readyToStart && verifier.containsQuorum(electingFollowers)) { electionFinished = true; electingFollowers.notifyAll(); } else { electingFollowers.wait(self.getInitLimit()*self.getTickTime()); if (waitingForNewEpoch) { throw new InterruptedException("Out of time to propose an epoch"); } } In Java, the wait statement can wake up without being notified, interrupted, or timing out, a so-called spurious wakeup. So it should be guarded by a while loop with the condition we're waiting for. |
2384 | No Perforce job exists for this issue. | 2 | 42036 | 8 years, 27 weeks ago | 0|i07k6f: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1190 | ant package is not including many of the bin scripts in the package (zkServer.sh for example) |
Bug | Closed | Blocker | Fixed | Eric Yang | Patrick D. Hunt | Patrick D. Hunt | 16/Sep/11 20:30 | 23/Nov/11 14:22 | 07/Oct/11 16:48 | 3.4.0, 3.5.0 | 3.4.0, 3.5.0 | build | 0 | 2 | ZOOKEEPER-999 | run "ant package" and look in the build/zookeeper-<version>/bin directory. many of the bin scripts are missing. |
161 | No Perforce job exists for this issue. | 2 | 32672 | 8 years, 24 weeks, 5 days ago | 0|i05yef: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1189 | For an invalid snapshot file(less than 10bytes size) RandomAccessFile stream is leaking. |
Bug | Closed | Major | Fixed | Rakesh Radhakrishnan | Rakesh Radhakrishnan | Rakesh Radhakrishnan | 16/Sep/11 10:29 | 23/Nov/11 14:22 | 26/Sep/11 21:11 | 3.3.3 | 3.3.4, 3.4.0, 3.5.0 | server | 0 | 3 | When loading the snapshot, ZooKeeper will consider only the 'snapshots with atleast 10 bytes size'. Otherwsie it will ignore and just return without closing the RandomAccessFile. {noformat} Util.isValidSnapshot() having the following logic. // Check for a valid snapshot RandomAccessFile raf = new RandomAccessFile(f, "r"); // including the header and the last / bytes // the snapshot should be atleast 10 bytes if (raf.length() < 10) { return false; } {noformat} Since the snapshot file validation logic is outside try block, it won't go to the finally block and will be leaked. Suggestion: Move the validation logic to the try/catch block. |
27 | No Perforce job exists for this issue. | 3 | 32673 | 8 years, 26 weeks, 2 days ago |
Incompatible change, Reviewed
|
0|i05yen: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1188 | client should detect a broken-network itself |
Wish | Open | Major | Unresolved | Unassigned | helei | helei | 15/Sep/11 22:47 | 15/Sep/11 22:47 | 3.3.3 | c client | 0 | 0 | Client receive session expire event after the connection with servers has recovered. But I think client should get it itself, after stay in lossconnecion for a session_expire_time period. Why we always wait for the message from servers? | 2385 | No Perforce job exists for this issue. | 0 | 42037 | 8 years, 27 weeks, 6 days ago | 0|i07k6n: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1187 | remove jdk dependency from the rpm spec |
Improvement | Resolved | Major | Won't Fix | Giridharan Kesavan | Giridharan Kesavan | Giridharan Kesavan | 15/Sep/11 13:53 | 03/Mar/16 11:21 | 03/Mar/16 11:21 | 0 | 0 | remove jdk dependency from the rpm spec | 2386 | No Perforce job exists for this issue. | 1 | 42038 | 4 years, 3 weeks ago | 0|i07k6v: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1186 | ZooKeeper client seems to hang quietly on OutOfMemoryError |
Bug | Resolved | Major | Duplicate | Unassigned | Stepan Koltsov | Stepan Koltsov | 15/Sep/11 09:21 | 01/Nov/11 12:06 | 01/Nov/11 12:06 | 3.3.3 | java client | 0 | 0 | ZOOKEEPER-1100 | ZooKeeper client seems to hang quietly on OutOfMemoryError. Look at code of ClientCnxn.SendThread.run: {code} void run() { while (zooKeeper.state.isAlive()) { try { ... } catch (Exception e) { // handle exception and restart } } ... } {code} If OutOfMemoryError happens somewhere inside of try block, thread just exits and ZooKeeper hangs. Client should handle any Throwable same way it handles Exception. |
2387 | No Perforce job exists for this issue. | 0 | 32674 | 8 years, 25 weeks, 1 day ago | 0|i05yev: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1185 | Send AuthFailed event to client if SASL authentication fails |
Bug | Closed | Major | Fixed | Eugene Joseph Koontz | Eugene Joseph Koontz | Eugene Joseph Koontz | 14/Sep/11 21:11 | 01/May/13 22:29 | 26/Sep/11 22:09 | 3.4.0 | 3.4.0, 3.5.0 | java client | 0 | 2 | ZOOKEEPER-938 | There are 3 places where ClientCnxn should queue a AuthFailed event if client fails to authenticate. Without sending this event, clients may be stuck watching for a SaslAuthenticated event that will never come (since the client failed to authenticate). |
kerberos, security | 28 | No Perforce job exists for this issue. | 2 | 32675 | 8 years, 26 weeks, 2 days ago | This patch fixes SaslAuthFailTest.testBadSaslAuthNotifiesWatch() to test for the AuthFailed event : previously, the test was incorrectly not testing for this event. It also removes the testBadSaslAuthNotifiesWatch() method from the SaslAuthTest class : this method belongs in SaslAuthFailTest, not SaslAuthTest. The former tests unsuccessful SASL authentication; the latter, successful SASL authentication. |
0|i05yf3: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1184 | jute generated files are not being cleaned up via "ant clean" |
Bug | Resolved | Major | Fixed | Thomas Koch | Patrick D. Hunt | Patrick D. Hunt | 14/Sep/11 18:48 | 17/Sep/11 06:56 | 16/Sep/11 20:25 | 3.5.0 | 3.5.0 | build | 0 | 2 | The change for ZOOKEEPER-96 has removed the generated files from SVN, it seems that these files should now live under build subdir? If this change is made be sure that the C/contrib/recipes environment is not broken... | 3960 | No Perforce job exists for this issue. | 1 | 32676 | 8 years, 27 weeks, 5 days ago |
Reviewed
|
0|i05yfb: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1183 | Enhance LogFormatter to output additional detail from transaction log |
Improvement | Patch Available | Minor | Unresolved | kishore gopalakrishna | kishore gopalakrishna | kishore gopalakrishna | 14/Sep/11 14:57 | 10/Oct/13 19:32 | 3.4.0 | 0 | 0 | Current LogFormatter prints the following information ZooKeeper Transactional Log File with dbid 0 txnlog format version 2 8/15/11 1:55:36 PM PDT session 0x131cf1a236f0014 cxid 0x0 zxid 0xf01 createSession 8/15/11 1:55:57 PM PDT session 0x131cf1a236f0000 cxid 0x55f zxid 0xf02 setData 8/15/11 1:56:00 PM PDT session 0x131cf1a236f0015 cxid 0x0 zxid 0xf03 createSession ... .. 8/15/11 2:00:33 PM PDT session 0x131cf1a236f001c cxid 0x36 zxid 0xf6b setData 8/15/11 2:00:33 PM PDT session 0x131cf1a236f0021 cxid 0xa1 zxid 0xf6c create 8/15/11 2:00:33 PM PDT session 0x131cf1a236f001b cxid 0x3e zxid 0xf6d setData 8/15/11 2:00:33 PM PDT session 0x131cf1a236f001e cxid 0x3e zxid 0xf6e setData 8/15/11 2:00:33 PM PDT session 0x131cf1a236f001d cxid 0x41 zxid 0xf6f setData Though this is good information, it does not provide additional information like createSession: which ip created the session and its time out set|get|delete: the path and data create: path created and createmode along with data We can add additional parameter -detail and provide detailed output of the transaction. Outputting data is slightly tricky since we cant print data without understanding the format. We need not print this for now. |
2388 | No Perforce job exists for this issue. | 1 | 42039 | 6 years, 24 weeks ago | 0|i07k73: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1182 | Make findbugs usable in Eclipse |
Task | Resolved | Minor | Fixed | Thomas Koch | Thomas Koch | Thomas Koch | 14/Sep/11 05:42 | 17/Sep/11 06:56 | 16/Sep/11 20:13 | 3.5.0 | 0 | 1 | I did not find any way how one could tell the eclipse findbugs extension to ignore the java files under src/java/test. I already use src/java/test/config/findbugsExcludeFile.xml but there are still many findbug warnings. So this patch solves the most obvious findbugs warnings under src/java/test. There are 30 remaining warnings which could either be ignored in the exclude file or solved by somebody with more knowledge about the code. |
3961 | No Perforce job exists for this issue. | 2 | 33318 | 8 years, 27 weeks, 5 days ago |
Reviewed
|
cleanup, cleancode | 0|i062dz: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1181 | Fix problems with Kerberos TGT renewal |
Bug | Closed | Major | Fixed | Eugene Joseph Koontz | Eugene Joseph Koontz | Eugene Joseph Koontz | 13/Sep/11 19:38 | 01/May/13 22:29 | 24/Oct/11 02:47 | 3.4.0 | 3.4.0, 3.5.0 | java client, server | 0 | 3 | ZOOKEEPER-938, ZOOKEEPER-1205 | Currently, in Zookeeper trunk, there are two problems with Kerberos TGT renewal: 1. TGTs obtained from a keytab are not refreshed periodically. They should be, just as those from ticket cache are refreshed. 2. Ticket renewal should be retried if it fails. Ticket renewal might fail if two or more separate processes (different JVMs) running as the same user try to renew Kerberos credentials at the same time. |
kerberos, security | 29 | No Perforce job exists for this issue. | 3 | 32677 | 8 years, 22 weeks, 3 days ago | -Fixes two findbugs warnings related to holding a lock while sleeping. -Addresses Camille's point: merge two almost-identical retry methods into a single retry method. |
Reviewed
|
0|i05yfj: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1180 | New entry for files ignored by svn. |
Improvement | Resolved | Minor | Implemented | Patrick D. Hunt | Warren Turkal | Warren Turkal | 13/Sep/11 19:32 | 23/Oct/13 07:09 | 22/Oct/13 19:26 | 0 | 1 | 900 | 900 | 0% | The following entry needs to be added to the svn:ignore property for src/java/lib: ant-eclipse-*.jar This will ignore the ant-eclipse-*.jar file which is downloaded when running the ant "eclipse" target. |
0% | 0% | 900 | 900 | 2389 | No Perforce job exists for this issue. | 0 | 42040 | 6 years, 22 weeks, 1 day ago | 0|i07k7b: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1179 | NettyServerCnxn does not properly close socket on 4 letter word requests |
Bug | Closed | Critical | Fixed | Rakesh Radhakrishnan | Camille Fournier | Camille Fournier | 13/Sep/11 12:20 | 13/Mar/14 14:17 | 11/Feb/14 20:18 | 3.4.0 | 3.4.6, 3.5.0 | server | 0 | 6 | ZOOKEEPER-1833, ZOOKEEPER-1839 | When calling a 4-letter-word to a server configured to use NettyServerCnxnFactory, the factory will not properly cancel all the keys and close the socket after sending the response for the 4lw. The close request will throw this exception, and the thread will not shut down: 2011-09-13 12:14:17,546 - WARN [New I/O server worker #1-1:NettyServerCnxnFactory$CnxnChannelHandler@117] - Exception caught [id: 0x009300cc, /1.1.1.1:38542 => /139.172.114.138:2181] EXCEPTION: java.io.IOException: A non-blocking socket operation could not be completed immediately java.io.IOException: A non-blocking socket operation could not be completed immediately at sun.nio.ch.SocketDispatcher.close0(Native Method) at sun.nio.ch.SocketDispatcher.preClose(SocketDispatcher.java:44) at sun.nio.ch.SocketChannelImpl.implCloseSelectableChannel(SocketChannelImpl.java:684) at java.nio.channels.spi.AbstractSelectableChannel.implCloseChannel(AbstractSelectableChannel.java:201) at java.nio.channels.spi.AbstractInterruptibleChannel.close(AbstractInterruptibleChannel.java:97) at org.jboss.netty.channel.socket.nio.NioWorker.close(NioWorker.java:593) at org.jboss.netty.channel.socket.nio.NioServerSocketPipelineSink.handleAcceptedSocket(NioServerSocketPipelineSink.java:119) at org.jboss.netty.channel.socket.nio.NioServerSocketPipelineSink.eventSunk(NioServerSocketPipelineSink.java:76) at org.jboss.netty.channel.Channels.close(Channels.java:720) at org.jboss.netty.channel.AbstractChannel.close(AbstractChannel.java:208) at org.apache.zookeeper.server.NettyServerCnxn.close(NettyServerCnxn.java:116) at org.apache.zookeeper.server.NettyServerCnxn.cleanupWriterSocket(NettyServerCnxn.java:241) at org.apache.zookeeper.server.NettyServerCnxn.access$0(NettyServerCnxn.java:231) at org.apache.zookeeper.server.NettyServerCnxn$CommandThread.run(NettyServerCnxn.java:314) at org.apache.zookeeper.server.NettyServerCnxn$CommandThread.start(NettyServerCnxn.java:305) at org.apache.zookeeper.server.NettyServerCnxn.checkFourLetterWord(NettyServerCnxn.java:674) at org.apache.zookeeper.server.NettyServerCnxn.receiveMessage(NettyServerCnxn.java:791) at org.apache.zookeeper.server.NettyServerCnxnFactory$CnxnChannelHandler.processMessage(NettyServerCnxnFactory.java:217) at org.apache.zookeeper.server.NettyServerCnxnFactory$CnxnChannelHandler.messageReceived(NettyServerCnxnFactory.java:141) at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:274) at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:261) at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:350) at org.jboss.netty.channel.socket.nio.NioWorker.processSelectedKeys(NioWorker.java:281) at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:201) at org.jboss.netty.util.internal.IoWorkerRunnable.run(IoWorkerRunnable.java:46) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) |
2390 | No Perforce job exists for this issue. | 3 | 32678 | 6 years, 2 weeks ago | Thanks Rakesh, you are right, this error is not happening anymore. Flavio, I'm closing this. | 0|i05yfr: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1178 | Add eclipse target for supporting Apache IvyDE |
Improvement | Patch Available | Minor | Unresolved | Warren Turkal | Warren Turkal | Warren Turkal | 12/Sep/11 12:03 | 05/Feb/20 07:12 | 3.7.0, 3.5.8 | build | 1 | 5 | 3600 | 3600 | 0% | HIVE-2739 | Mac OS X w/ Eclipse 3.7. However, I believe this will work in any Eclipse environment. | This patch adds support for Eclipse with Apache IvyDE, which is the extension that integrates Ivy support into Eclipse. This allows the creation of what appear to be fully portable .eclipse and .classpath files. I will be posting a patch shortly. | 0% | 0% | 3600 | 3600 | 2391 | No Perforce job exists for this issue. | 1 | 2442 | 3 years, 39 weeks, 2 days ago | Add support for Eclipse with Apache IvyDE extension | 0|i00ru7: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1177 | Enabling a large number of watches for a large number of clients |
Improvement | Resolved | Major | Fixed | Fangmin Lv | Vishal Kathuria | Vishal Kathuria | 09/Sep/11 18:59 | 28/Sep/18 19:20 | 28/Sep/18 17:38 | 3.3.3 | 3.6.0 | server | 1 | 21 | 0 | 49800 | In my ZooKeeper, I see watch manager consuming several GB of memory and I dug a bit deeper. In the scenario I am testing, I have 10K clients connected to an observer. There are about 20K znodes in ZooKeeper, each is about 1K - so about 20M data in total. Each client fetches and puts watches on all the znodes. That is 200 million watches. It seems a single watch takes about 100 bytes. I am currently at 14528037 watches and according to the yourkit profiler, WatchManager has 1.2 G already. This is not going to work as it might end up needing 20G of RAM just for the watches. So we need a more compact way of storing watches. Here are the possible solutions. 1. Use a bitmap instead of the current hashmap. In this approach, each znode would get a unique id when its gets created. For every session, we can keep track of a bitmap that indicates the set of znodes this session is watching. A bitmap, assuming a 100K znodes, would be 12K. For 10K sessions, we can keep track of watches using 120M instead of 20G. 2. This second idea is based on the observation that clients watch znodes in sets (for example all znodes under a folder). Multiple clients watch the same set and the total number of sets is a couple of orders of magnitude smaller than the total number of znodes. In my scenario, there are about 100 sets. So instead of keeping track of watches at the znode level, keep track of it at the set level. It may mean that get may also need to be implemented at the set level. With this, we can save the watches in 100M. Are there any other suggestions of solutions? Thanks |
100% | 100% | 49800 | 0 | pull-request-available | 2392 | No Perforce job exists for this issue. | 5 | 42041 | 1 year, 24 weeks, 6 days ago | Changes to the watch manager to support very large (200 million) watches. This change also improves the synchronization in the WatchManager to reduce the contention on various watch manager operations (mainly addWatch() which is a fairly common operation after trigger watch). |
Reviewed
|
0|i07k7j: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1176 | Remove dead code and basic cleanup in DataTree |
Task | Resolved | Major | Fixed | Thomas Koch | Thomas Koch | Thomas Koch | 09/Sep/11 11:44 | 17/Sep/11 06:56 | 16/Sep/11 20:36 | 3.5.0 | 0 | 1 | - DataTree members scount, initialized and method listACLEquals are never used - transform if(!C) B else A to if(C) A else B (removes one indirection to follow for the brain) - remove unused imports and one annotation - add method getApproximateDataSize to DataNode (I work towards an immutable DataNode without public properties) - move assignments (lastPrefix = getMaxPrefixWithQuota(path)) out of if statements - combine nested if statements: if A if B then C => if A && B => C - make ACL maps private and add getAclSize() to hide implementation details of the ACLs. |
3962 | No Perforce job exists for this issue. | 5 | 33319 | 8 years, 27 weeks, 5 days ago |
Reviewed
|
cleanup, cleancode | 0|i062e7: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1175 | DataNode references parent node for no reason |
Improvement | Resolved | Minor | Fixed | Thomas Koch | Thomas Koch | Thomas Koch | 08/Sep/11 14:53 | 15/Sep/11 06:56 | 14/Sep/11 18:55 | 3.5.0 | 0 | 1 | Having the parent referenced in a node makes the tree building harder then it needs to be. With the parent you need to get the parent before you can create the DataNode. Without the parent in the DataNode one can have a method tree.put(String path, new DataNode(...)). | 3963 | No Perforce job exists for this issue. | 2 | 33320 | 8 years, 28 weeks ago |
Reviewed
|
0|i062ef: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1174 | FD leak when network unreachable |
Bug | Closed | Critical | Fixed | Ted Dunning | Ted Dunning | Ted Dunning | 08/Sep/11 14:47 | 23/Nov/11 14:22 | 30/Sep/11 18:02 | 3.3.3 | 3.3.4, 3.4.0, 3.5.0 | java client | 0 | 2 | ZOOKEEPER-1271, ZOOKEEPER-1205 | In the socket connection logic there are several errors that result in bad behavior. The basic problem is that a socket is registered with a selector unconditionally when there are nuances that should be dealt with. First, the socket may connect immediately. Secondly, the connect may throw an exception. In either of these two cases, I don't think that the socket should be registered. I will attach a test case that demonstrates the problem. I have been unable to create a unit test that exhibits the problem because I would have to mock the low level socket libraries to do so. It would still be good to do so if somebody can figure out a good way. |
167 | No Perforce job exists for this issue. | 8 | 32679 | 8 years, 21 weeks, 2 days ago | 0|i05yfz: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1173 | Server never forgets old ACL lists |
Bug | Open | Major | Unresolved | Thomas Koch | Thomas Koch | Thomas Koch | 07/Sep/11 10:47 | 07/Sep/11 10:49 | server | 0 | 1 | ZOOKEEPER-1035, ZOOKEEPER-1055, ZOOKEEPER-33 | The ACL stuff in DataTree.java reimplements a kind of reference system. The idea may have been to save memory for equal ACL lists. However there's no code that ever removes an ACL list that is not used anymore. Related: - The ACL stuff could be in a separate class so that DataTree.java is not such a big beast anymore. - It's risky to have mutable objects (list) as keys in a HashMap. An idea to solve this: Have ACL lists as members of the datatree nodes. Lookup already existing ACL lists in a java.util.WeakHashMap. |
acl | 2393 | No Perforce job exists for this issue. | 0 | 32680 | 8 years, 29 weeks, 1 day ago | 0|i05yg7: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1172 | Support for custom org.apache.zookeeper.client.HostProvider implementation. |
Improvement | Patch Available | Major | Unresolved | César Álvarez Núñez | César Álvarez Núñez | César Álvarez Núñez | 06/Sep/11 10:41 | 14/Dec/19 06:09 | 3.7.0 | java client | 1 | 3 | The interface org.apache.zookeeper.client.HostProvider exist but it is hardcoded to org.apache.zookeeper.client.StaticHostProvider at Zookeeper constructor. Now it could be replaced by any other implementation just by calling the new Zookeeper constructor methods which accept a HostProvider as paramater. |
30 | No Perforce job exists for this issue. | 3 | 2507 | 5 years, 50 weeks ago | Support for custom org.apache.zookeeper.client.HostProvider implementation with the help of new Zookeeper constructor methods. | 0|i00s8n: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1171 | fix build for java 7 |
Bug | Closed | Minor | Fixed | Patrick D. Hunt | Patrick D. Hunt | Patrick D. Hunt | 02/Sep/11 15:35 | 23/Nov/11 14:22 | 13/Sep/11 17:50 | 3.4.0 | 3.4.0 | build | 0 | 1 | I tried testing out zk on java 7 (not yet officially supported) but I ran into a road block due to the build failing. Patch coming next. | 3964 | No Perforce job exists for this issue. | 1 | 32681 | 8 years, 28 weeks ago |
Reviewed
|
0|i05ygf: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1170 | Fix compiler (eclipse) warnings: unused imports, unused variables, missing generics |
Improvement | Resolved | Minor | Fixed | Thomas Koch | Thomas Koch | Thomas Koch | 02/Sep/11 15:12 | 15/Sep/11 06:56 | 14/Sep/11 18:23 | 3.5.0 | 0 | 1 | IDE warnings get useless if there are too many of them. This issue + patch fixes nearly the rest of them. | 3965 | No Perforce job exists for this issue. | 1 | 33321 | 8 years, 28 weeks ago |
Reviewed
|
cleanup, cleancode | 0|i062en: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1169 | Fix compiler (eclipse) warnings in (generated) jute code |
Improvement | Closed | Minor | Fixed | Thomas Koch | Thomas Koch | Thomas Koch | 02/Sep/11 14:39 | 23/Nov/11 14:22 | 02/Sep/11 16:54 | 3.4.0 | 0 | 0 | Fixes for compiled jute parser: - missing generic types - added @SuppressWarnings("unused") because javacc adds a dead throws clause at the end of functions. Fixes for code compiled by jute compiler: - remove import java.util.* and use full ref to java.util.Arrays One warning fixed in non-compiled code: src/java/main/org/apache/jute/compiler/JRecord.java Rationale: The warnings in your IDE (eclipse) get useless if there are tons of them. This patch reduces many of them. Another issue with patch will reduce them to 8. |
3966 | No Perforce job exists for this issue. | 1 | 33322 | 8 years, 29 weeks, 5 days ago |
Reviewed
|
cleanup, cleancode | 0|i062ev: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1168 | ZooKeeper fails to run with IKVM |
Bug | Closed | Major | Fixed | Andrew Finnell | Andrew Finnell | Andrew Finnell | 31/Aug/11 15:52 | 23/Nov/11 14:22 | 01/Sep/11 13:15 | 3.4.0 | 3.4.0 | jmx | 0 | 1 | 86400 | 86400 | 0% | All Architectures. Running with IKVM and OpenJDK instead of Sun JDK 6. | OS: Windows 64-bit JRE: IKVM 7.0.4258 IKVM 7.0.4258 does not support ManagementFactory.getPlatformMBeanServer(); It will throw a java.lang.Error. |
0% | 0% | 86400 | 86400 | 3967 | No Perforce job exists for this issue. | 2 | 32682 | 8 years, 29 weeks, 6 days ago |
Reviewed
|
0|i05ygn: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1167 | C api lacks synchronous version of sync() call. |
Bug | Reopened | Major | Unresolved | Marshall McMullen | Nicholas Harteau | Nicholas Harteau | 31/Aug/11 10:22 | 05/Feb/20 07:15 | 3.3.3, 3.4.3, 3.5.0 | 3.7.0, 3.5.8 | c client | 2 | 7 | Reading through the source, the C API implements zoo_async() which is the zookeeper sync() method implemented in the multithreaded/asynchronous C API. It doesn't implement anything equivalent in the non-multithreaded API. I'm not sure if this was oversight or intentional, but it means that the non-multithreaded API can't guarantee consistent client views on critical reads. The zkperl bindings depend on the synchronous, non-multithreaded API so also can't call sync() currently. |
2394 | No Perforce job exists for this issue. | 1 | 32683 | 2 years, 44 weeks, 1 day ago | 0|i05ygv: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1166 | Please add a few svn:ignore properties |
Improvement | Closed | Minor | Fixed | Patrick D. Hunt | Warren Turkal | Warren Turkal | 29/Aug/11 19:59 | 23/Nov/11 14:22 | 01/Sep/11 16:43 | 3.4.0 | 3.4.0 | build | 0 | 0 | 3600 | 3600 | 0% | Please add a couple svn:ignore properties to make dealing with the code slightly easier. At the root, please add an svn:ignore property for "build" so that the default build directory for eclipse is excluded. At src/java/lib, please add an svn:ignore property for "*.jar" so that jars acquired by ivy are ignored. |
0% | 0% | 3600 | 3600 | 3968 | No Perforce job exists for this issue. | 0 | 33323 | 8 years, 28 weeks, 2 days ago |
Reviewed
|
0|i062f3: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1165 | better eclipse support in tests |
Bug | Closed | Minor | Fixed | Warren Turkal | Warren Turkal | Warren Turkal | 29/Aug/11 18:08 | 23/Nov/11 14:22 | 02/Sep/11 13:02 | 3.4.0 | 3.4.0 | tests | 0 | 2 | 3600 | 3600 | 0% | Eclipse | The Eclipse test runner tries to run tests from all classes that inherit from TestCase. However, this class is inherited by at least one class (org.apache.zookeeper.test.system.BaseSysTest) that has no test cases as it is used as infrastructure for other real test cases. This patch annotates that class with @Ignore, which causes the class to be Ignored. Also, due to the way annotations are not inherited by default, this patch will not affect classes that inherit from this class. | 0% | 0% | 3600 | 3600 | patch | 3969 | No Perforce job exists for this issue. | 1 | 32684 | 8 years, 29 weeks, 6 days ago | Small Eclipse test fix. |
Reviewed
|
0|i05yh3: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1164 | Support encryption for C binding |
New Feature | Open | Major | Unresolved | Unassigned | Eric Yang | Eric Yang | 29/Aug/11 13:59 | 01/May/13 22:29 | 3.5.0 | c client | 0 | 0 | ZOOKEEPER-823 | If ZooKeeper is going to switch to netty for connections to support encryption, then C binding library and other language bindings should be updated to support communication through netty to support encryption. | 2395 | No Perforce job exists for this issue. | 0 | 42042 | 8 years, 30 weeks, 3 days ago | 0|i07k7r: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1163 | Memory leak in zk_hashtable.c:do_insert_watcher_object() |
Bug | Resolved | Major | Fixed | Anupam Chanda | Anupam Chanda | Anupam Chanda | 25/Aug/11 13:47 | 02/Mar/16 20:36 | 25/Jun/12 14:09 | 3.3.3 | 3.3.6, 3.4.4, 3.5.0 | c client | 0 | 3 | zk_hashtable.c:do_insert_watcher_object() line number 193 calls add_to_list with clone flag set to 1. This leaks memory, since the original watcher object was already allocated on the heap by activateWatcher() line 330. I will upload a patch shortly. The fix is to set clone flag to 0 in the call to add_to_list(). |
2396 | No Perforce job exists for this issue. | 1 | 32685 | 7 years, 39 weeks, 2 days ago | 0|i05yhb: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1162 | consistent handling of jute.maxbuffer when attempting to read large zk "directories" |
Improvement | Open | Major | Unresolved | Michael Han | Jonathan Hsieh | Jonathan Hsieh | 25/Aug/11 01:36 | 14/Dec/19 06:08 | 3.3.3 | 3.7.0 | server | 12 | 25 | HBASE-4246, ZOOKEEPER-706, ZOOKEEPER-2260, HBASE-14938 | Recently we encountered a sitaution where a zk directory got sucessfully populated with 250k elements. When our system attempted to read the znode dir, it failed because the contents of the dir exceeded the default 1mb jute.maxbuffer limit. There were a few odd things 1) It seems odd that we could populate to be very large but could not read the listing 2) The workaround was bumping up jute.maxbuffer on the client side Would it make more sense to have it reject adding new znodes if it exceeds jute.maxbuffer? Alternately, would it make sense to have zk dir listing ignore the jute.maxbuffer setting? |
2397 | No Perforce job exists for this issue. | 0 | 42043 | 3 years, 9 weeks, 1 day ago | 0|i07k7z: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1161 | Provide an option for disabling auto-creation of the data directory |
New Feature | Resolved | Major | Fixed | Patrick D. Hunt | Roman Shaposhnik | Roman Shaposhnik | 24/Aug/11 16:17 | 07/Mar/12 05:58 | 06/Mar/12 03:23 | 3.5.0 | scripts, server | 0 | 2 | Currently if ZK starts and doesn't see and existing dataDir it tries to create it. There should be an option to tweak this behavior. As for default, my personal opinion is to NOW allow autocreate. | 2398 | No Perforce job exists for this issue. | 3 | 12512 | 8 years, 3 weeks, 1 day ago |
Reviewed
|
0|i02hyv: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1160 | ZOOKEEPER-1157 test timeouts are too small |
Sub-task | Closed | Major | Fixed | Benjamin Reed | Benjamin Reed | Benjamin Reed | 23/Aug/11 01:03 | 23/Nov/11 14:22 | 05/Sep/11 14:32 | 3.4.0 | tests | 0 | 0 | in reviewing some tests that weren't passing i notices that the tick time was 2ms rather than the normal 2000ms. i think this is causing tests to fail on some slow/overloaded machines. | 3970 | No Perforce job exists for this issue. | 2 | 33324 | 8 years, 29 weeks, 2 days ago |
Reviewed
|
0|i062fb: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1159 | ClientCnxn does not propagate session expiration indication |
Bug | Resolved | Major | Won't Fix | Andor Molnar | Andrew Kyle Purtell | Andrew Kyle Purtell | 20/Aug/11 13:16 | 08/May/18 16:43 | 08/May/18 16:42 | 3.4.0 | 3.4.0 | java client | 6 | 11 | HBASE-4235 | ClientCnxn does not always propagate session expiration indication up to clients. If a reconnection attempt fails because the session has since expired, the KeeperCode is still Disconnected, but shouldn't it be set to Expired? Perhaps like so: {code} --- a/src/java/main/org/apache/zookeeper/ClientCnxn.java +++ b/src/java/main/org/apache/zookeeper/ClientCnxn.java @@ -1160,6 +1160,7 @@ public class ClientCnxn { clientCnxnSocket.doTransport(to, pendingQueue, outgoingQueue); } catch (Exception e) { + Event.KeeperState eventState = Event.KeeperState.Disconnected; if (closing) { if (LOG.isDebugEnabled()) { // closing so this is expected @@ -1172,6 +1173,7 @@ public class ClientCnxn { // this is ugly, you have a better way speak up if (e instanceof SessionExpiredException) { LOG.info(e.getMessage() + ", closing socket connection"); + eventState = Event.KeeperState.Expired; } else if (e instanceof SessionTimeoutException) { LOG.info(e.getMessage() + RETRY_CONN_MSG); } else if (e instanceof EndOfStreamException) { @@ -1191,7 +1193,7 @@ public class ClientCnxn { if (state.isAlive()) { eventThread.queueEvent(new WatchedEvent( Event.EventType.None, - Event.KeeperState.Disconnected, + eventState, null)); } clientCnxnSocket.updateNow(); {code} This affects HBase. HBase master and region server processes will shut down by design if their session has expired, but will attempt to reconnect if they think they have been disconnected. The above prevents proper termination. |
165 | No Perforce job exists for this issue. | 0 | 32686 | 2 years, 4 weeks, 2 days ago | 0|i05yhj: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1158 | C# client |
Improvement | Open | Major | Unresolved | Eric Hauser | Eric Hauser | Eric Hauser | 19/Aug/11 23:17 | 14/Dec/19 06:08 | 3.7.0 | 0 | 7 | Native C# client for ZooKeeper. | 2399 | No Perforce job exists for this issue. | 4 | 42044 | 7 years, 49 weeks, 2 days ago | 0|i07k87: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1157 | Some of the tests timeout or cause JVM crash |
Bug | Open | Minor | Unresolved | Unassigned | Vishal Kathuria | Vishal Kathuria | 19/Aug/11 15:58 | 29/Jun/12 13:11 | 3.3.3 | tests | 0 | 1 | ZOOKEEPER-1160 | The following tests are consistently timing out for me, and sometimes they crash the JVM. We need to look at these tests and make sure they pass consistently, otherwise they provide no value. org.apache.zookeeper.test.AsyncHammerTest org.apache.zookeeper.test.FollowerResyncConcurrencyTest org.apache.zookeeper.test.ObserverQuorumHammerTest org.apache.zookeeper.test.QuorumHammerTest org.apache.zookeeper.test.QuorumTest |
test | 2400 | No Perforce job exists for this issue. | 0 | 32687 | 8 years, 31 weeks, 2 days ago | 0|i05yhr: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1156 | Log truncation truncating log too much - can cause data loss |
Bug | Closed | Blocker | Fixed | Vishal Kathuria | Vishal Kathuria | Vishal Kathuria | 18/Aug/11 13:48 | 23/Nov/11 14:22 | 05/Sep/11 16:04 | 3.3.3 | 3.3.4, 3.4.0 | quorum, server | 0 | 2 | 86400 | 86400 | 0% | The log truncation relies on position calculation for a particular zxid to figure out the new size of the log file. There is a bug in PositionInputStream implementation which skips counting the bytes in the log which have value 0. This can lead to underestimating the actual log size. The log records which should be there can get truncated, leading to data loss on the participant which is executing the trunc. Clients can see different values depending on whether they connect to the node on which trunc was executed. |
0% | 0% | 86400 | 86400 | 3971 | No Perforce job exists for this issue. | 1 | 32688 | 8 years, 29 weeks, 5 days ago | 0|i05yhz: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1155 | Add windows automated builds (CI) for zookeeper c client bindings |
Improvement | Resolved | Major | Fixed | Camille Fournier | Dheeraj Agrawal | Dheeraj Agrawal | 16/Aug/11 13:26 | 20/Oct/11 10:09 | 20/Oct/11 10:09 | 3.3.4, 3.4.0 | c client | 0 | 1 | setup an CI build on windows to make sure that the new code checked in compiles fine on windows (VS compilers) for the zookeeper c bindings. There is a ticket opened with the INFRA team to assign a build box and setup CI env for zookeeper c bindings https://issues.apache.org/jira/browse/INFRA-3842 feel free to help us with this effort, this will ensure that the new checkins dont break windows builds. |
2401 | No Perforce job exists for this issue. | 2 | 33325 | 8 years, 23 weeks ago | 0|i062fj: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1154 | Data inconsistency when the node(s) with the highest zxid is not present at the time of leader election |
Bug | Closed | Blocker | Fixed | Vishal Kathuria | Vishal Kathuria | Vishal Kathuria | 15/Aug/11 13:36 | 23/Nov/11 14:22 | 05/Sep/11 16:04 | 3.3.3 | 3.3.4, 3.4.0 | quorum | 0 | 2 | 1814400 | 1814400 | 0% | If a participant with the highest zxid (lets call it A) isn't present during leader election, a participant with a lower zxid (say B) might be chosen as a leader. When A comes up, it will replay the log with that higher zxid. The change that was in that higher zxid will only be visible to the clients connecting to the participant A, but not to other participants. I was able to reproduce this problem by 1. connect debugger to B and C and suspend them, so they don't write anything 2. Issue an update to the leader A. 3. After a few seconds, crash all servers (A,B,C) 4. Start B and C, let the leader election take place 5. Start A. 6. You will find that the update done in step 2 is visible on A but not on B,C, hence the inconsistency. Below is a more detailed analysis of what is happening in the code. Initial Condition 1. Lets say there are three nodes in the ensemble A,B,C with A being the leader 2. The current epoch is 7. 3. For simplicity of the example, lets say zxid is a two digit number, with epoch being the first digit. 4. The zxid is 73 5. All the nodes have seen the change 73 and have persistently logged it. Step 1 Request with zxid 74 is issued. The leader A writes it to the log but there is a crash of the entire ensemble and B,C never write the change 74 to their log. Step 3 B,C restart, A is still down B,C form the quorum B is the new leader. Lets say B minCommitLog is 71 and maxCommitLog is 73 epoch is now 8, zxid is 80 Request with zxid 81 is successful. On B, minCommitLog is now 71, maxCommitLog is 81 Step 4 A starts up. It applies the change in request with zxid 74 to its in-memory data tree A contacts B to registerAsFollower and provides 74 as its ZxId Since 71<=74<=81, B decides to send A the diff. B will send to A the proposal 81. Problem: The problem with the above sequence is that A's data tree has the update from request 74, which is not correct. Before getting the proposals 81, A should have received a trunc to 73. I don't see that in the code. If the maxCommitLog on B hadn't bumped to 81 but had stayed at 73, that case seems to be fine. |
0% | 0% | 1814400 | 1814400 | 3972 | No Perforce job exists for this issue. | 4 | 32689 | 8 years, 29 weeks, 5 days ago | 0|i05yi7: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1153 | Deprecate AuthFLE and LE |
Improvement | Closed | Major | Fixed | Flavio Paiva Junqueira | Flavio Paiva Junqueira | Flavio Paiva Junqueira | 15/Aug/11 05:33 | 23/Nov/11 14:22 | 30/Aug/11 02:29 | 3.3.3 | 3.4.0 | 0 | 0 | I propose we mark these as deprecated in 3.4.0 and remove them in the following release. | 3973 | No Perforce job exists for this issue. | 2 | 33326 | 8 years, 30 weeks, 2 days ago |
Reviewed
|
0|i062fr: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1152 | Exceptions thrown from handleAuthentication can cause buffer corruption issues in NIOServer |
Bug | Closed | Major | Fixed | Camille Fournier | Camille Fournier | Camille Fournier | 12/Aug/11 16:27 | 23/Nov/11 14:22 | 20/Aug/11 21:05 | 3.3.3, 3.4.0 | 3.4.0 | server | 0 | 1 | Exceptions thrown by an AuthenticationProvider's handleAuthentication method will not be caught, and can cause the buffers in the NIOServer to not read requests fully or properly. Any exceptions thrown here should be caught and treated as auth failure. | 3974 | No Perforce job exists for this issue. | 1 | 32690 | 8 years, 31 weeks, 3 days ago |
Reviewed
|
0|i05yif: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1151 | http://zookeeper.apache.org/doc/trunk/api/ missing |
Improvement | Open | Trivial | Unresolved | Unassigned | Eugene Joseph Koontz | Eugene Joseph Koontz | 10/Aug/11 13:57 | 10/Aug/11 13:57 | 3.4.0 | documentation | 0 | 0 | I see in http://zookeeper.apache.org/doc/ that we have http://zookeeper.apache.org/doc/trunk/, but http://zookeeper.apache.org/doc/trunk/api/ is a 404. I can generate the docs locally, but it would be useful to be able to be able to have URLs to reference the trunk API (e.g. for discussing new features in the JIRA). | 2402 | No Perforce job exists for this issue. | 0 | 42045 | 8 years, 33 weeks, 1 day ago | 0|i07k8f: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1150 | ZOOKEEPER-1027 fix for this patch to compile on windows... |
Sub-task | Closed | Blocker | Fixed | Dheeraj Agrawal | Dheeraj Agrawal | Dheeraj Agrawal | 10/Aug/11 11:58 | 23/Nov/11 14:22 | 14/Aug/11 13:48 | 3.3.3 | 3.4.0 | c client | 0 | 3 | fix for this patch to compile on windows... | 3975 | No Perforce job exists for this issue. | 1 | 33327 | 8 years, 32 weeks, 3 days ago |
Reviewed
|
0|i062fz: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1149 | users cannot migrate from 3.4->3.3->3.4 server code against a single datadir |
Task | Closed | Blocker | Fixed | Patrick D. Hunt | Patrick D. Hunt | Patrick D. Hunt | 04/Aug/11 18:15 | 11/Oct/13 17:49 | 24/Oct/11 02:49 | 3.4.0, 3.5.0 | 3.4.0, 3.5.0 | server | 0 | 0 | 3.4 is checking acceptedEpoch/currentEpoch files against the snap/log files in datadir. These files are new in 3.4. If they don't exist the server will create them, however if they do exist the server will validate them. As a result if a user 1) upgrades from 3.3 to 3.4 this is fine 2) downgrades from 3.4 to 3.3 this is also fine (3.3 ignores these files) 3) however, 3.4->3.3->3.4 fails because 3.4 will see invalid *Epoch files in the datadir (as 3.3 would have ignored them, applying changes to snap/log w/o updating them) |
163 | No Perforce job exists for this issue. | 0 | 33328 | 6 years, 23 weeks, 6 days ago | The ZooKeeper server cannot be migrated from version 3.4 to version 3.3 and then back to version 3.4 without user intervention. Upgrading from 3.3 to 3.4 is supported as is downgrading from 3.4 to 3.3. However moving from 3.4 to 3.3 and back to 3.4 will fail. 3.4 is checking the datadir for "acceptedEpoch" and "currentEpoch" files and comparing these against the snapshot and log files contained in the same directory. These epoch files are new in 3.4. As a result: 1) upgrading from 3.3 to 3.4 is fine - the files don't exist, the server creates them 2) downgrading from 3.4 to 3.3 - this is also fine as version 3.3 ignores these files 3) however, 3.4->3.3->3.4 fails because 3.4 will see invalid *Epoch files in the datadir (as 3.3 would have ignored them, applying changes to snap/log w/o updating them) A workaround for this problem is to delete the epoch files if this situation occurrs - the version 3.4 server will create them similar to case 1) above. |
Incompatible change
|
0|i062g7: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1148 | Multi-threaded handling of reads |
Improvement | Open | Major | Unresolved | Unassigned | Vishal Kathuria | Vishal Kathuria | 04/Aug/11 15:52 | 14/Dec/19 06:07 | 3.7.0 | server | 0 | 1 | This improvement is to take advantage of multiple cores in the machines that typically run ZooKeeper servers to get higher read throughput. The challenge with multiple threads is read/write ordering guarantees that ZooKeeper provides. One way of handling these is to let readOnly clients use the multiple threads, and the read/write clients continue to use the same single CommitProcessor thread for both reads and writes. For this to work, a client would have to declare its readOnly intent through a flag at connect time. (We already have a readOnly flag, but its intent is a bit different). Another way of honoring the read/write guarantee is to let all sessions start as readOnly sessions and have them use the multi-threaded reads until they do their first write. Once a session performs a write, it automatically flips from a read/write session to a read only session and starts using the single threaded CommitProcessor. This is a little tricky as one has to worry about in flight reads when the write comes and we have to make sure those reads finish before the write goes through. I would like to get the community's feedback on whether it would be useful to have this and whether an automatic discovery of readOnly or read/write intent is critical for this to be useful. For us, the clients know at connect time whether they will ever do a write or not, so an automatic detection is of limited use. |
scaling | 2403 | No Perforce job exists for this issue. | 1 | 42046 | 8 years, 31 weeks, 6 days ago | 0|i07k8n: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1147 | Add support for local sessions |
Improvement | Resolved | Major | Fixed | Thawan Kooburat | Vishal Kathuria | Vishal Kathuria | 04/Aug/11 15:06 | 22/May/19 17:52 | 09/Oct/13 17:19 | 3.3.3 | 3.5.0 | server | 3 | 17 | 3024000 | 3024000 | 0% | ZOOKEEPER-1851, ZOOKEEPER-1648, ZOOKEEPER-1787, ZOOKEEPER-1607, HBASE-5843 | This improvement is in the bucket of making ZooKeeper work at a large scale. We are planning on having about a 1 million clients connect to a ZooKeeper ensemble through a set of 50-100 observers. Majority of these clients are read only - ie they do not do any updates or create ephemeral nodes. In ZooKeeper today, the client creates a session and the session creation is handled like any other update. In the above use case, the session create/drop workload can easily overwhelm an ensemble. The following is a proposal for a "local session", to support a larger number of connections. 1. The idea is to introduce a new type of session - "local" session. A "local" session doesn't have a full functionality of a normal session. 2. Local sessions cannot create ephemeral nodes. 3. Once a local session is lost, you cannot re-establish it using the session-id/password. The session and its watches are gone for good. 4. When a local session connects, the session info is only maintained on the zookeeper server (in this case, an observer) that it is connected to. The leader is not aware of the creation of such a session and there is no state written to disk. 5. The pings and expiration is handled by the server that the session is connected to. With the above changes, we can make ZooKeeper scale to a much larger number of clients without making the core ensemble a bottleneck. In terms of API, there are two options that are being considered 1. Let the client specify at the connect time which kind of session do they want. 2. All sessions connect as local sessions and automatically get promoted to global sessions when they do an operation that requires a global session (e.g. creating an ephemeral node) Chubby took the approach of lazily promoting all sessions to global, but I don't think that would work in our case, where we want to keep sessions which never create ephemeral nodes as always local. Option 2 would make it more broadly usable but option 1 would be easier to implement. We are thinking of implementing option 1 as the first cut. There would be a client flag, IsLocalSession (much like the current readOnly flag) that would be used to determine whether to create a local session or a global session. |
0% | 0% | 3024000 | 3024000 | api-change, scaling | 2404 | No Perforce job exists for this issue. | 9 | 42047 | 43 weeks, 1 day ago | 0|i07k8v: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1146 | significant regression in client (c/python) performance |
Bug | Closed | Blocker | Fixed | Patrick D. Hunt | Patrick D. Hunt | Patrick D. Hunt | 04/Aug/11 13:21 | 23/Nov/11 14:22 | 14/Aug/11 13:30 | 3.4.0 | 3.4.0 | c client | 0 | 2 | I tried running my latency tester against trunk, in so doing I noticed that the C/Python (not sure which yet) client performance has seriously degraded since 3.3.3. The first run (below) is with released 3.3.3 client against a 3 server ensemble running released 3.3.3 server code. The second run is the exact same environment (same ensemble), however using trunk c/zkpython client. Notice: 1) in the first run operations are approx 10ms/write, 0.25ms/read - which is pretty much what's expected. 2) however in the second run we are seeing 50ms/operation regardless of read or write. {noformat} [phunt@c0309 zk-smoketest-3.3.3]$ PYTHONPATH=lib.linux-x86_64-2.6/ LD_LIBRARY_PATH=lib.linux-x86_64-2.6/ python26 ./zk-latencies.py --servers "c0309:2181,c0310:2181,c0311:2181" --znode_size=100 --znode_count=100 --timeout=5000 --synchronous Connecting to c0309:2181 Connected in 16 ms, handle is 0 Connecting to c0310:2181 Connected in 16 ms, handle is 1 Connecting to c0311:2181 Connected in 15 ms, handle is 2 Testing latencies on server c0309:2181 using syncronous calls created 100 permanent znodes in 959 ms (9.599378 ms/op 104.173415/sec) set 100 znodes in 933 ms (9.332101 ms/op 107.157002/sec) get 100 znodes in 27 ms (0.270889 ms/op 3691.551589/sec) deleted 100 permanent znodes in 881 ms (8.812950 ms/op 113.469388/sec) created 100 ephemeral znodes in 956 ms (9.564152 ms/op 104.557103/sec) watched 100 znodes in 26 ms (0.264361 ms/op 3782.707587/sec) deleted 100 ephemeral znodes in 881 ms (8.819292 ms/op 113.387792/sec) notif 100 watches in 999 ms (9.994299 ms/op 100.057038/sec) Testing latencies on server c0310:2181 using syncronous calls created 100 permanent znodes in 964 ms (9.640460 ms/op 103.729490/sec) set 100 znodes in 933 ms (9.332800 ms/op 107.148981/sec) get 100 znodes in 29 ms (0.299308 ms/op 3341.036650/sec) deleted 100 permanent znodes in 886 ms (8.864651 ms/op 112.807603/sec) created 100 ephemeral znodes in 958 ms (9.585140 ms/op 104.328161/sec) watched 100 znodes in 30 ms (0.300801 ms/op 3324.459240/sec) deleted 100 ephemeral znodes in 886 ms (8.865030 ms/op 112.802779/sec) notif 100 watches in 1000 ms (10.000212 ms/op 99.997878/sec) Testing latencies on server c0311:2181 using syncronous calls created 100 permanent znodes in 958 ms (9.582071 ms/op 104.361569/sec) set 100 znodes in 935 ms (9.359350 ms/op 106.845024/sec) get 100 znodes in 25 ms (0.252700 ms/op 3957.263893/sec) deleted 100 permanent znodes in 891 ms (8.913291 ms/op 112.192013/sec) created 100 ephemeral znodes in 958 ms (9.584489 ms/op 104.335246/sec) watched 100 znodes in 25 ms (0.251091 ms/op 3982.627356/sec) deleted 100 ephemeral znodes in 891 ms (8.915379 ms/op 112.165730/sec) notif 100 watches in 1000 ms (10.000508 ms/op 99.994922/sec) Latency test complete [phunt@c0309 zk-smoketest-3.3.3]$ cd ../zk-smoketest-trunk/ [phunt@c0309 zk-smoketest-trunk]$ PYTHONPATH=lib.linux-x86_64-2.6/ LD_LIBRARY_PATH=lib.linux-x86_64-2.6/ python26 ./zk-latencies.py --servers "c0309:2181,c0310:2181,c0311:2181" --znode_size=100 --znode_count=100 --timeout=5000 --synchronous Connecting to c0309:2181 Connected in 31 ms, handle is 0 Connecting to c0310:2181 Connected in 16 ms, handle is 1 Connecting to c0311:2181 Connected in 16 ms, handle is 2 Testing latencies on server c0309:2181 using syncronous calls created 100 permanent znodes in 5099 ms (50.999281 ms/op 19.608119/sec) set 100 znodes in 5066 ms (50.665429 ms/op 19.737324/sec) get 100 znodes in 4009 ms (40.093150 ms/op 24.941916/sec) deleted 100 permanent znodes in 5040 ms (50.404449 ms/op 19.839519/sec) created 100 ephemeral znodes in 5124 ms (51.249170 ms/op 19.512511/sec) watched 100 znodes in 4051 ms (40.514441 ms/op 24.682557/sec) deleted 100 ephemeral znodes in 5048 ms (50.484939 ms/op 19.807888/sec) notif 100 watches in 1000 ms (10.004182 ms/op 99.958199/sec) Testing latencies on server c0310:2181 using syncronous calls created 100 permanent znodes in 5115 ms (51.157510 ms/op 19.547472/sec) set 100 znodes in 5056 ms (50.568910 ms/op 19.774996/sec) get 100 znodes in 4099 ms (40.999382 ms/op 24.390612/sec) deleted 100 permanent znodes in 5041 ms (50.418010 ms/op 19.834182/sec) created 100 ephemeral znodes in 5083 ms (50.835850 ms/op 19.671157/sec) watched 100 znodes in 4100 ms (41.003261 ms/op 24.388304/sec) deleted 100 ephemeral znodes in 5058 ms (50.581930 ms/op 19.769906/sec) notif 100 watches in 1000 ms (10.005081 ms/op 99.949219/sec) Testing latencies on server c0311:2181 using syncronous calls created 100 permanent znodes in 5099 ms (50.992720 ms/op 19.610642/sec) set 100 znodes in 5091 ms (50.916569 ms/op 19.639972/sec) get 100 znodes in 4099 ms (40.996401 ms/op 24.392385/sec) deleted 100 permanent znodes in 5066 ms (50.669601 ms/op 19.735699/sec) created 100 ephemeral znodes in 5124 ms (51.249208 ms/op 19.512496/sec) watched 100 znodes in 4099 ms (40.999141 ms/op 24.390755/sec) deleted 100 ephemeral znodes in 5049 ms (50.498819 ms/op 19.802443/sec) notif 100 watches in 999 ms (9.997852 ms/op 100.021486/sec) Latency test complete {noformat} |
3976 | No Perforce job exists for this issue. | 1 | 32691 | 8 years, 29 weeks ago |
Reviewed
|
0|i05yin: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1145 | ObserverTest.testObserver fails at particular point after several runs of ant junt.run -Dtestcase=ObserverTest |
Bug | Closed | Blocker | Duplicate | Vishal Kher | Eugene Joseph Koontz | Eugene Joseph Koontz | 03/Aug/11 18:36 | 23/Nov/11 14:22 | 14/Aug/11 21:02 | 3.4.0 | 3.4.0 | 0 | 0 | ZOOKEEPER-1144 | Use the attached repeat.sh to run ObserverTest repeatedly by doing: src/repeat.sh ObserverTest The test will will fail eventually after a few iterations; should be only a few minutes. The line that fails in the test is: zk = new ZooKeeper("127.0.0.1:" + CLIENT_PORT_OBS, ClientBase.CONNECTION_TIMEOUT, this); Attached as out.txt is the output showing a successful run, for comparison, followed by a failed run. Note that in the seconds before the test fails, in the following lines, that there is a 24 second gap in time (between 22:13:02 and 22:13:26): bq. [junit] 2011-08-03 22:13:02,167 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11229:ZooKeeperServer@833] - Client attempting to establish new session at /127.0.0.1:46929 [junit] 2011-08-03 22:13:26,003 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11228:Leader@419] - Shutting down [junit] 2011-08-03 22:13:26,003 [myid:2] - INFO [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11228:Leader@425] - Shutdown called [junit] java.lang.Exception: shutdown Leader! reason: Only 0 followers, need 1 |
3977 | No Perforce job exists for this issue. | 2 | 32692 | 8 years, 32 weeks, 3 days ago | 0|i05yiv: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1144 | ZooKeeperServer not starting on leader due to a race condition |
Bug | Closed | Blocker | Fixed | Vishal Kher | Vishal Kher | Vishal Kher | 03/Aug/11 18:35 | 23/Nov/11 14:22 | 11/Aug/11 14:10 | 3.4.0 | 3.4.0 | 0 | 1 | ZOOKEEPER-1125, ZOOKEEPER-1145 | I have found one problem that is causing QuorumPeerMainTest:testQuorum to fail. This test uses 2 ZK servers. The test is failing because leader is not starting ZooKeeperServer after leader election. so everything halts. With the new changes, the server is now started in Leader.processAck() which is called from LeaderHandler. processAck() starts ZooKeeperServer if majority have acked NEWLEADER. The leader puts its ack in the the ackSet in Leader.lead(). Since processAck() is called from LearnerHandler it can happen that the learner's ack is processed before the leader is able to put its ack in the ackSet. When LearnerHandler invokes processAck(), the ackSet for newLeaderProposal will not have quorum (in this case 2). As a result, the ZooKeeperServer is never started on the Leader. The leader needs to ensure that its ack is put in ackSet before starting LearnerCnxAcceptor or invoke processAck() itself after adding to ackSet. I haven't had time to go through the ZAB2 changes so I am not too familiar with the code. Can Ben/Flavio fix this? |
3978 | No Perforce job exists for this issue. | 1 | 32693 | 8 years, 33 weeks ago | 0|i05yj3: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1143 | quorum send & recv workers are missing thread names |
Improvement | Closed | Minor | Fixed | Patrick D. Hunt | Patrick D. Hunt | Patrick D. Hunt | 03/Aug/11 17:27 | 23/Nov/11 14:22 | 14/Aug/11 20:44 | 3.4.0 | server | 0 | 0 | Simplifies debugging. | 3979 | No Perforce job exists for this issue. | 1 | 33329 | 8 years, 32 weeks, 3 days ago |
Reviewed
|
0|i062gf: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1142 | incorrect stat output |
Bug | Closed | Blocker | Fixed | Patrick D. Hunt | Patrick D. Hunt | Patrick D. Hunt | 02/Aug/11 20:29 | 23/Nov/11 14:22 | 11/Aug/11 02:00 | 3.4.0 | 3.4.0 | server | 0 | 1 | stat output seems to be missing some end of line: {noformat} echo stat |nc c0309 2181 Zookeeper version: 3.4.0--1, built on 08/02/2011 22:25 GMT Clients: /172.29.81.91:33378[0](queued=0,recved=1,sent=0 Latency min/avg/max: 0/28/252 Received: 246844 Sent: 266737 Outstanding: 0 Zxid: 0x4000508c2 Mode: follower Node count: 4 {noformat} Multiple clients end up on the same line (missing newline) |
3980 | No Perforce job exists for this issue. | 1 | 32694 | 8 years, 33 weeks ago |
Reviewed
|
0|i05yjb: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1141 | zkpython fails tests under python 2.4 |
Bug | Closed | Major | Fixed | Patrick D. Hunt | Patrick D. Hunt | Patrick D. Hunt | 02/Aug/11 19:31 | 23/Nov/11 14:22 | 14/Aug/11 20:41 | 3.4.0 | 3.4.0 | contrib-bindings | 0 | 1 | "ant test" under python 2.4 is failing due to a small issue in the test code - using a new feature introduced in 2.5. I have a small patch which addresses this, after which I was able to compile and run the tests successfully under python 2.4. |
3981 | No Perforce job exists for this issue. | 1 | 32695 | 8 years, 32 weeks, 3 days ago |
Reviewed
|
python | 0|i05yjj: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1140 | server shutdown is not stopping threads |
Bug | Closed | Blocker | Fixed | Laxman | Patrick D. Hunt | Patrick D. Hunt | 29/Jul/11 12:49 | 23/Nov/11 14:22 | 30/Aug/11 02:37 | 3.4.0 | 3.4.0 | server, tests | 0 | 3 | Near the end of QuorumZxidSyncTest there are tons of threads running - 115 "ProcessThread" threads, similar numbers of SessionTracker. Also I see ~100 ReadOnlyRequestProcessor - why is this running as a separate thread? (henry/flavio?) |
3982 | No Perforce job exists for this issue. | 1 | 32696 | 8 years, 30 weeks, 2 days ago |
Reviewed
|
0|i05yjr: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1139 | jenkins is reporting two warnings, fix these |
Bug | Closed | Minor | Fixed | Patrick D. Hunt | Patrick D. Hunt | Patrick D. Hunt | 27/Jul/11 17:42 | 23/Nov/11 14:22 | 28/Jul/11 19:23 | 3.4.0 | 3.4.0 | 0 | 1 | cleanup jenkins report, currently 2 compiler warnings being reported. |
3983 | No Perforce job exists for this issue. | 1 | 32697 | 8 years, 33 weeks, 1 day ago |
Reviewed
|
0|i05yjz: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1138 | release audit failing for a number of new files |
Bug | Closed | Blocker | Fixed | Patrick D. Hunt | Patrick D. Hunt | Patrick D. Hunt | 27/Jul/11 16:04 | 23/Nov/11 14:22 | 28/Jul/11 18:06 | 3.4.0 | 3.4.0 | 0 | 1 | I'm seeing a number of problems in the release audit output for 3.4.0, these must be fixed before 3.4.0 release: {noformat} [rat:report] !????? /grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build/zookeeper-3.4.0/contrib/ZooInspector/config/defaultConnectionSettings.cfg [rat:report] !????? /grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build/zookeeper-3.4.0/contrib/ZooInspector/config/defaultNodeVeiwers.cfg [rat:report] !????? /grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build/zookeeper-3.4.0/contrib/ZooInspector/licences/epl-v10.html [rat:report] !????? /grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build/zookeeper-3.4.0/src/c/Cli.vcproj [rat:report] !????? /grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build/zookeeper-3.4.0/src/c/include/winconfig.h [rat:report] !????? /grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build/zookeeper-3.4.0/src/c/include/winstdint.h [rat:report] !????? /grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build/zookeeper-3.4.0/src/c/zookeeper.sln [rat:report] !????? /grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build/zookeeper-3.4.0/src/c/zookeeper.vcproj [rat:report] !????? /grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build/zookeeper-3.4.0/src/contrib/huebrowser/zkui/src/zkui/static/help/index.html [rat:report] !????? /grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build/zookeeper-3.4.0/src/contrib/huebrowser/zkui/src/zkui/static/js/package.yml [rat:report] !????? /grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build/zookeeper-3.4.0/src/contrib/loggraph/web/org/apache/zookeeper/graph/log4j.properties [rat:report] !????? /grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build/zookeeper-3.4.0/src/contrib/loggraph/web/org/apache/zookeeper/graph/resources/date.format.js [rat:report] !????? /grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build/zookeeper-3.4.0/src/contrib/loggraph/web/org/apache/zookeeper/graph/resources/g.bar.js [rat:report] !????? /grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build/zookeeper-3.4.0/src/contrib/loggraph/web/org/apache/zookeeper/graph/resources/g.dot.js [rat:report] !????? /grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build/zookeeper-3.4.0/src/contrib/loggraph/web/org/apache/zookeeper/graph/resources/g.line.js [rat:report] !????? /grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build/zookeeper-3.4.0/src/contrib/loggraph/web/org/apache/zookeeper/graph/resources/g.pie.js [rat:report] !????? /grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build/zookeeper-3.4.0/src/contrib/loggraph/web/org/apache/zookeeper/graph/resources/g.raphael.js [rat:report] !????? /grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build/zookeeper-3.4.0/src/contrib/loggraph/web/org/apache/zookeeper/graph/resources/raphael.js [rat:report] !????? /grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build/zookeeper-3.4.0/src/contrib/loggraph/web/org/apache/zookeeper/graph/resources/yui-min.js [rat:report] !????? /grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build/zookeeper-3.4.0/src/contrib/monitoring/JMX-RESOURCES [rat:report] !????? /grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build/zookeeper-3.4.0/src/contrib/zooinspector/config/defaultConnectionSettings.cfg [rat:report] !????? /grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build/zookeeper-3.4.0/src/contrib/zooinspector/config/defaultNodeVeiwers.cfg [rat:report] !????? /grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build/zookeeper-3.4.0/src/contrib/zooinspector/lib/log4j.properties [rat:report] !????? /grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build/zookeeper-3.4.0/src/contrib/zooinspector/licences/epl-v10.html [rat:report] !????? /grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build/zookeeper-3.4.0/src/java/test/org/apache/zookeeper/MultiTransactionRecordTest.java [rat:report] !????? /grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build/zookeeper-3.4.0/src/java/test/org/apache/zookeeper/server/quorum/LearnerTest.java Lines that start with ????? in the release audit report indicate files that do not have an Apache license header. {noformat} |
3984 | No Perforce job exists for this issue. | 1 | 32698 | 8 years, 33 weeks, 1 day ago |
Reviewed
|
0|i05yk7: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1137 | AuthFLE is throwing NPE when servers are configured with different election ports. |
Bug | Open | Critical | Unresolved | Unassigned | Laxman | Laxman | 27/Jul/11 09:02 | 20/Jun/12 18:27 | 3.3.3 | leaderElection | 0 | 1 | 86400 | 86400 | 0% | AuthFLE is throwing NPE when servers are configured with different election ports. *Configuration* {noformat} server.1 = 10.18.52.25:2888:3888 server.2 = 10.18.52.205:2889:3889 server.3 = 10.18.52.144:2899:3890 {noformat} *Logs* {noformat} 2011-07-22 16:06:22,404 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:65170:AuthFastLeaderElection@844] - Election tally 2011-07-22 16:06:29,483 - ERROR [WorkerSender Thread: 6:NIOServerCnxn$Factory$1@81] - Thread Thread[WorkerSender Thread: 6,5,main] died java.lang.NullPointerException at org.apache.zookeeper.server.quorum.AuthFastLeaderElection$Messenger$WorkerSender.process(AuthFastLeaderElection.java:488) at org.apache.zookeeper.server.quorum.AuthFastLeaderElection$Messenger$WorkerSender.run(AuthFastLeaderElection.java:432) at java.lang.Thread.run(Thread.java:619) 2011-07-22 16:06:29,583 - ERROR [WorkerSender Thread: 1:NIOServerCnxn$Factory$1@81] - Thread Thread[WorkerSender Thread: 1,5,main] died java.lang.NullPointerException {noformat} |
0% | 0% | 86400 | 86400 | 31 | No Perforce job exists for this issue. | 3 | 32699 | 8 years, 21 weeks, 6 days ago | Leader Election | 0|i05ykf: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1136 | NEW_LEADER should be queued not sent to match the Zab 1.0 protocol on the twiki |
Bug | Closed | Blocker | Fixed | Benjamin Reed | Benjamin Reed | Benjamin Reed | 26/Jul/11 12:58 | 23/Nov/11 14:22 | 14/Sep/11 02:59 | 3.4.0 | 0 | 2 | the NEW_LEADER message was sent at the beginning of the sync phase in Zab pre1.0, but it must be at the end in Zab 1.0. if the protocol is 1.0 or greater we need to queue rather than send the packet. | 3985 | No Perforce job exists for this issue. | 3 | 32700 | 8 years, 21 weeks, 2 days ago |
Reviewed
|
0|i05ykn: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1135 | clarify usage of clientPortAddress zoo.cfg option |
Improvement | Open | Major | Unresolved | Eugene Joseph Koontz | Eugene Joseph Koontz | Eugene Joseph Koontz | 22/Jul/11 19:06 | 22/Jul/11 19:07 | documentation | 1 | 0 | ZOOKEEPER-635 | Documentation should clarify permitted usage of clientPortAddress: Add something like: "You must specify the port and the address separately like so: clientPortAddress=my.hostname.com clientPort=2181 (that is, you can't do "clientPortAddress=my.hostname.com:2181")" |
2405 | No Perforce job exists for this issue. | 0 | 42048 | 8 years, 35 weeks, 6 days ago | 0|i07k93: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1134 | ClientCnxnSocket string comparison using == rather than equals |
Bug | Closed | Critical | Fixed | Patrick D. Hunt | Patrick D. Hunt | Patrick D. Hunt | 22/Jul/11 18:18 | 23/Nov/11 14:22 | 25/Jul/11 17:32 | 3.4.0 | 3.4.0 | server | 0 | 1 | Noticed string comparison using == rather than equals. | 3986 | No Perforce job exists for this issue. | 1 | 32701 | 8 years, 35 weeks, 2 days ago |
Reviewed
|
0|i05ykv: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1133 | ZOOKEEPER-635 allow for "clientPortAddress=host:port" |
Sub-task | Open | Minor | Unresolved | Eugene Joseph Koontz | Eugene Joseph Koontz | Eugene Joseph Koontz | 22/Jul/11 14:57 | 14/Dec/19 06:08 | 3.7.0 | server | 0 | 0 | 2407 | No Perforce job exists for this issue. | 1 | 42049 | 8 years, 35 weeks, 1 day ago | Currently ZOOKEEPER-635 allows: clientPortAddress=my.host.name clientPort=1234 This patch lets you combine this into a single configuration line: clientPortAddress=my.host.name:1234 |
0|i07k9b: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1132 | ZooKeeper FAQ is out of date wrt testing SessionExpiredException |
Bug | Open | Major | Unresolved | Unassigned | Will Johnson | Will Johnson | 22/Jul/11 08:56 | 22/Jul/11 08:56 | 3.3.3 | documentation | 1 | 1 | See http://markmail.org/thread/vyipodh6ar2b77a3 In addition, this other thread was mentioned as the culprit: http://markmail.org/thread/z5bt4o3quqil7r7t There still seems to be no way to programmatically test SessionExipredExceptions based on these threads. I'm not sure if that warrants a separate ticket or not. |
documentation, test | 2408 | No Perforce job exists for this issue. | 0 | 32702 | 8 years, 35 weeks, 6 days ago | 0|i05yl3: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1131 | Transactions can be dropped because leader election uses last committed zxid instead of last acknowledged/received zxid |
Bug | Resolved | Major | Not A Problem | Unassigned | Alexander Shraer | Alexander Shraer | 21/Jul/11 15:16 | 27/Jul/11 14:59 | 25/Jul/11 18:28 | 3.4.0 | leaderElection, server | 0 | 1 | Suppose we have 3 servers - A, B, C which have seen the same number of commits. - A is the leader and it sends out a new proposal. - B doesn't receive the proposal, but A and C receive and ACK it - A commits the proposal, but fails before anyone else sees the commit. - B and C start leader election. - since both B and C saw the same number of commits, if B has a higher server-id than C, leader election will elect B. Then, the last transaction will be truncated from C's log, which is a bug since it was acked by a majority. This happens since servers propose their last committed zxid in leader election, and not their last received / acked zxid (this is not being tracked, AFAIK). See method FastLeaderElection.getInitLastLoggedZxid(), which calls QuorumPeer.getLastLoggedZxid(), which is supposed to return the last logged Zxid, but instead calls zkDb.getDataTreeLastProcessedZxid() which returns the last committed zxid. |
3987 | No Perforce job exists for this issue. | 0 | 32703 | 8 years, 35 weeks, 1 day ago | 0|i05ylb: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1130 | Java port of PHunt's zk-smoketest |
New Feature | Open | Major | Unresolved | Colin Goodheart-Smithe | Colin Goodheart-Smithe | Colin Goodheart-Smithe | 21/Jul/11 08:25 | 14/Dec/19 06:08 | 3.4.0 | 3.7.0 | contrib | 0 | 0 | I have ported Patrick's zookeeper smoke test to Java so that it can be run on windows machines (since I couldn't find any way of getting the python bindings for windows). The port provides the same functionality as the python varient as of 21st June 2011. | 32 | No Perforce job exists for this issue. | 4 | 42050 | 8 years, 12 weeks, 3 days ago | 0|i07k9j: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1129 | Add RPM/Debian packages to Jenkins |
Task | Resolved | Major | Won't Fix | Unassigned | Eric Yang | Eric Yang | 20/Jul/11 14:41 | 03/Mar/16 11:19 | 03/Mar/16 11:19 | 0 | 0 | For taking advantage of packages generated by ZOOKEEPER-999. It would be nice to setup rpm/debian package build on Jenkins. | 2409 | No Perforce job exists for this issue. | 0 | 42051 | 4 years, 3 weeks ago | 0|i07k9r: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1128 | Recipe wrong for Lock process. |
Bug | Resolved | Major | Fixed | yynil | yynil | yynil | 19/Jul/11 12:49 | 02/Mar/16 20:35 | 27/Jul/11 21:21 | 3.3.3 | recipes | 0 | 1 | http://zookeeper.apache.org/doc/trunk/recipes.html The current recipe for Lock has the wrong process. Specifically, for the "4. The client calls exists( ) with the watch flag set on the path in the lock directory with the next lowest sequence number." It shouldn't be the "the next lowest sequence number". It should be the "current lowest path". If you're gonna use "the next lowest sequence number", you'll never wait for the lock possession. The following is the test code: {code:title=LockTest.java|borderStyle=solid} ACL acl = new ACL(Perms.ALL, new Id("10.0.0.0/8", "1")); List<ACL> acls = new ArrayList<ACL>(); acls.add(acl); String connectStr = "localhost:2181"; final Semaphore sem = new Semaphore(0); ZooKeeper zooKeeper = new ZooKeeper(connectStr, 1000 * 30, new Watcher() { @Override public void process(WatchedEvent event) { System.out.println("eventType:" + event.getType()); System.out.println("keeperState:" + event.getState()); if (event.getType() == Event.EventType.None) { if (event.getState() == Event.KeeperState.SyncConnected) { sem.release(); } } } }); System.out.println("state:" + zooKeeper.getState()); System.out.println("Waiting for the state to be connected"); try { sem.acquire(); } catch (InterruptedException ex) { ex.printStackTrace(); } System.out.println("Now state:" + zooKeeper.getState()); String directory = "/_locknode_"; Stat stat = zooKeeper.exists(directory, false); if (stat == null) { zooKeeper.create(directory, new byte[]{}, ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT); } String prefix = directory + "/lock-"; String path = zooKeeper.create(prefix, new byte[]{}, ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.EPHEMERAL_SEQUENTIAL); System.out.println("Create the path for " + path); while (true) { List<String> children = zooKeeper.getChildren(directory, false); Collections.sort(children); System.out.println("The whole lock size is " + children.size()); String lowestPath = children.get(0); DecimalFormat df = new DecimalFormat("0000000000"); String currentSuffix = lowestPath.substring("lock-".length()); System.out.println("CurrentSuffix is " + currentSuffix); int intIndex = Integer.parseInt(currentSuffix); if (path.equals(directory + "/" + lowestPath)) { //I've got the lock and release it System.out.println("I've got the lock at " + new Date()); System.out.println("next index is " + intIndex); Thread.sleep(10000); System.out.println("After sleep 3 seconds, I'm gonna release the lock"); zooKeeper.delete(path, -1); break; } final Semaphore wakeupSem = new Semaphore(0); stat = zooKeeper.exists(directory + "/" + lowestPath, new Watcher() { @Override public void process(WatchedEvent event) { System.out.println("Event is " + event.getType()); System.out.println("State is " + event.getState()); if (event.getType() == Event.EventType.NodeDeleted) { wakeupSem.release(); } } }); if (stat != null) { System.out.println("Waiting for the delete of "); wakeupSem.acquire(); } else { System.out.println("Continue to seek"); } } {code} |
3988 | No Perforce job exists for this issue. | 1 | 32704 | 8 years, 35 weeks ago | 0|i05ylj: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1127 | Auth completion are called for every registered auth, and auths are never removed from the auth list. (even after they are processed). |
Bug | Open | Critical | Unresolved | Unassigned | Dheeraj Agrawal | Dheeraj Agrawal | 18/Jul/11 14:50 | 26/Dec/12 02:10 | 3.3.3 | c client | 0 | 2 | When we get a auth response, every time we process any auth_response, we call ALL the auth completions (might be registered by different add_auth_info calls). we should be calling only the one that the request came from? I guess we dont know for which request the response corresponds to? If the requests are processed in FIFO and response are got in order then may be we can figure out which add_auth info request the response corresponds to. Also , we never remove entries from the auth_list Also the logging is misleading. <code> 1206 if (rc) { 1207 LOG_ERROR(("Authentication scheme %s failed. Connection closed.", 1208 zh->auth_h.auth->scheme)); 1209 } 1210 else { 1211 LOG_INFO(("Authentication scheme %s succeeded", zh->auth_h.auth->scheme)); </code> If there are multiple auth_info in the auth_list , we always print success/failure for ONLY the first one. So if I had two auths for scehmes, ABCD and EFGH and my auth scheme EFGH failed, the logs will still say ABCD failed |
2410 | No Perforce job exists for this issue. | 0 | 32705 | 7 years, 13 weeks, 1 day ago | 0|i05ylr: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1126 | state of zk_handle should NOT be initialized to 0 (CLOSED) in zokeeper_init. It should have a not initialized state. |
Bug | Resolved | Major | Duplicate | Dheeraj Agrawal | Dheeraj Agrawal | Dheeraj Agrawal | 18/Jul/11 14:47 | 18/Jul/11 17:19 | 18/Jul/11 17:19 | 3.3.3 | c client | 0 | 2 | In zoo_add_auth, we have following check. 2954 // [ZOOKEEPER-800] zoo_add_auth should return ZINVALIDSTATE if 2955 // the connection is closed. 2956 if (zoo_state(zh) == 0) { 2957 return ZINVALIDSTATE; when we do zookeeper_init, the state is initialized to 0 and above we check if state = 0 then throw exception. There is a race condition where the doIo thread is slow and has not changed the state to CONNECTING, then you end up returning back ZKINVALIDSTATE from zoo_add_auth. The problem is we use 0 for CLOSED state and UNINITIALIZED state. in case of uninitialized case it should let it go through. Is this intentional? In java we have the uninitialized state = null. If not we can initialize it to some other magic number. |
3989 | No Perforce job exists for this issue. | 0 | 32706 | 8 years, 36 weeks, 3 days ago | 0|i05ylz: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1125 | Intermittent java core test failures |
Bug | Resolved | Major | Not A Problem | Vishal Kher | Vishal Kher | Vishal Kher | 13/Jul/11 16:33 | 15/May/14 18:53 | 15/May/14 18:53 | 3.5.0 | tests | 2 | 3 | ZOOKEEPER-1144, ZOOKEEPER-1055 | Some of the tests are consistently failing for me and intermittently on hudson. Posting discussion from mailing list below. Vishal, Can you please open a jira for this and mark it as a blocker for 3.4 release? Looks like its transient: https://builds.apache.org/job/ZooKeeper-trunk/ The latest build is passing. thanks mahadev - Hide quoted text - On Mon, Jul 11, 2011 at 12:49 PM, Vishal Kher <vishalmlst@gmail.com> wrote: > Hi, > > ant test-core-java is consistently failing for me. > > The error seems to be either: > > Testcase: testFollowersStartAfterLeader took 35.577 sec > Caused an ERROR > Did not connect > java.util.concurrent.TimeoutException: Did not connect > at > org.apache.zookeeper.test.ClientBase$CountdownWatcher.waitForConnected(ClientBase.java:124) > at > org.apache.zookeeper.test.QuorumTest.testFollowersStartAfterLeader(QuorumTest.java:308) > at > org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52) > > or > > Testcase: testNoLogBeforeLeaderEstablishment took 8.831 sec > Caused an ERROR > KeeperErrorCode = ConnectionLoss for /blah > org.apache.zookeeper.KeeperException$ConnectionLossException: > KeeperErrorCode = ConnectionLoss for /blah > at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) > at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) > at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:761) > at > org.apache.zookeeper.test.QuorumTest.testNoLogBeforeLeaderEstablishment(QuorumTest.java:385) > at > org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52) > > Looks like the reason why the tests are failing for me is similar to why the > tests failed on hudson: > > 2011-07-11 14:47:26,219 [myid:] - INFO [QuorumPeer[myid=2]/0.0.0.0:11379 > :Leader@425] - Shutdown called > java.lang.Exception: shutdown Leader! reason: Only 0 followers, need 1 > at org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:425) > at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:400) > at > org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:729) > 2011-07-11 14:47:26,220 [myid:] - INFO [QuorumPeer[myid=2]/0.0.0.0:11379 > :ZooKeeperServer@416] - shutting down > > The leader is not able to ping the followers. Has anyone seen this before? > > Thanks. > -Vishal > > On Sun, Jul 10, 2011 at 6:52 AM, Apache Jenkins Server < > jenkins@builds.apache.org> wrote: > >> See https://builds.apache.org/job/ZooKeeper-trunk/1239/ >> >> >> ################################################################################### >> ########################## LAST 60 LINES OF THE CONSOLE >> ########################### >> [...truncated 242795 lines...] >> [junit] 2011-07-10 10:57:16,673 [myid:] - INFO >> [main:SessionTrackerImpl@206] - Shutting down >> [junit] 2011-07-10 10:57:16,673 [myid:] - INFO >> [main:PrepRequestProcessor@702] - Shutting down >> [junit] 2011-07-10 10:57:16,674 [myid:] - INFO >> [main:SyncRequestProcessor@170] - Shutting down >> [junit] 2011-07-10 10:57:16,674 [myid:] - INFO >> [SyncThread:0:SyncRequestProcessor@152] - SyncRequestProcessor exited! >> [junit] 2011-07-10 10:57:16,675 [myid:] - INFO >> [main:FinalRequestProcessor@423] - shutdown of request processor complete >> [junit] 2011-07-10 10:57:16,674 [myid:] - INFO [ProcessThread(sid:0 >> cport:-1)::PrepRequestProcessor@133] - PrepRequestProcessor exited loop! >> [junit] 2011-07-10 10:57:16,676 [myid:] - INFO [main:ClientBase@227] - >> connecting to 127.0.0.1 11221 >> [junit] ensureOnly:[] >> [junit] 2011-07-10 10:57:16,677 [myid:] - INFO [main:ClientBase@428] - >> STARTING server >> [junit] 2011-07-10 10:57:16,678 [myid:] - INFO >> [main:ZooKeeperServer@164] - Created server with tickTime 3000 >> minSessionTimeout 6000 maxSessionTimeout 60000 datadir >> /grid/0/hudson/hudson-slave/workspace/ZooKeeper-trunk/trunk/build/test/tmp/test1139867753736175617.junit.dir/version-2 >> snapdir >> /grid/0/hudson/hudson-slave/workspace/ZooKeeper-trunk/trunk/build/test/tmp/test1139867753736175617.junit.dir/version-2 >> [junit] 2011-07-10 10:57:16,679 [myid:] - INFO >> [main:NIOServerCnxnFactory@94] - binding to port 0.0.0.0/0.0.0.0:11221 >> [junit] 2011-07-10 10:57:16,680 [myid:] - INFO [main:FileSnap@83] - >> Reading snapshot >> /grid/0/hudson/hudson-slave/workspace/ZooKeeper-trunk/trunk/build/test/tmp/test1139867753736175617.junit.dir/version-2/snapshot.b >> [junit] 2011-07-10 10:57:16,683 [myid:] - INFO [main:FileTxnSnapLog@256] >> - Snapshotting: b >> [junit] 2011-07-10 10:57:16,684 [myid:] - INFO [main:ClientBase@227] - >> connecting to 127.0.0.1 11221 >> [junit] 2011-07-10 10:57:16,685 [myid:] - INFO [NIOServerCxn.Factory: >> 0.0.0.0/0.0.0.0:11221:NIOServerCnxnFactory@197] - Accepted socket >> connection from /127.0.0.1:45122 >> [junit] 2011-07-10 10:57:16,686 [myid:] - INFO [NIOServerCxn.Factory: >> 0.0.0.0/0.0.0.0:11221:NIOServerCnxn@815] - Processing stat command from / >> 127.0.0.1:45122 >> [junit] 2011-07-10 10:57:16,686 [myid:] - INFO >> [Thread-5:NIOServerCnxn$StatCommand@652] - Stat command output >> [junit] 2011-07-10 10:57:16,688 [myid:] - INFO >> [Thread-5:NIOServerCnxn@995] - Closed socket connection for client / >> 127.0.0.1:45122 (no session established for client) >> [junit] ensureOnly:[InMemoryDataTree, StandaloneServer_port] >> [junit] expect:InMemoryDataTree >> [junit] found:InMemoryDataTree >> org.apache.ZooKeeperService:name0=StandaloneServer_port-1,name1=InMemoryDataTree >> [junit] expect:StandaloneServer_port >> [junit] found:StandaloneServer_port >> org.apache.ZooKeeperService:name0=StandaloneServer_port-1 >> [junit] 2011-07-10 10:57:16,690 [myid:] - INFO >> [main:JUnit4ZKTestRunner$LoggedInvokeMethod@57] - FINISHED TEST METHOD >> testQuota >> [junit] 2011-07-10 10:57:16,690 [myid:] - INFO [main:ClientBase@465] - >> tearDown starting >> [junit] 2011-07-10 10:57:16,754 [myid:] - INFO [main:ZooKeeper@662] - >> Session: 0x13113b1aca50000 closed >> [junit] 2011-07-10 10:57:16,754 [myid:] - INFO >> [main-EventThread:ClientCnxn$EventThread@495] - EventThread shut down >> [junit] 2011-07-10 10:57:16,754 [myid:] - INFO [main:ClientBase@435] - >> STOPPING server >> [junit] 2011-07-10 10:57:16,755 [myid:] - INFO [NIOServerCxn.Factory: >> 0.0.0.0/0.0.0.0:11221:NIOServerCnxnFactory@224] - NIOServerCnxn factory >> exited run method >> [junit] 2011-07-10 10:57:16,755 [myid:] - INFO >> [main:ZooKeeperServer@416] - shutting down >> [junit] 2011-07-10 10:57:16,756 [myid:] - INFO >> [main:SessionTrackerImpl@206] - Shutting down >> [junit] 2011-07-10 10:57:16,756 [myid:] - INFO >> [main:PrepRequestProcessor@702] - Shutting down >> [junit] 2011-07-10 10:57:16,757 [myid:] - INFO >> [main:SyncRequestProcessor@170] - Shutting down >> [junit] 2011-07-10 10:57:16,760 [myid:] - INFO [ProcessThread(sid:0 >> cport:-1)::PrepRequestProcessor@133] - PrepRequestProcessor exited loop! >> [junit] 2011-07-10 10:57:16,762 [myid:] - INFO >> [SyncThread:0:SyncRequestProcessor@152] - SyncRequestProcessor exited! >> [junit] 2011-07-10 10:57:16,762 [myid:] - INFO >> [main:FinalRequestProcessor@423] - shutdown of request processor complete >> [junit] 2011-07-10 10:57:16,763 [myid:] - INFO [main:ClientBase@227] - >> connecting to 127.0.0.1 11221 >> [junit] ensureOnly:[] >> [junit] 2011-07-10 10:57:16,767 [myid:] - INFO [main:ClientBase@493] - >> fdcount after test is: 35 at start it was 24 >> [junit] 2011-07-10 10:57:16,767 [myid:] - INFO [main:ClientBase@495] - >> sleeping for 20 secs >> [junit] 2011-07-10 10:57:16,768 [myid:] - INFO [main:ZKTestCase$1@60] >> - SUCCEEDED testQuota >> [junit] 2011-07-10 10:57:16,768 [myid:] - INFO [main:ZKTestCase$1@55] >> - FINISHED testQuota >> [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.691 sec >> >> BUILD FAILED >> /grid/0/hudson/hudson-slave/workspace/ZooKeeper-trunk/trunk/build.xml:959: >> The following error occurred while executing this line: >> /grid/0/hudson/hudson-slave/workspace/ZooKeeper-trunk/trunk/build.xml:870: >> Tests failed! >> >> Total time: 19 minutes 0 seconds >> [FINDBUGS] Skipping publisher since build result is FAILURE >> [WARNINGS] Skipping publisher since build result is FAILURE >> Recording fingerprints >> Archiving artifacts >> Recording test results >> Publishing Javadoc >> Publishing Clover coverage report... >> No Clover report will be published due to a Build Failure >> Email was triggered for: Failure >> Sending email for trigger: Failure >> >> >> >> >> ################################################################################### >> ############################## FAILED TESTS (if any) >> ############################## >> 2 tests failed. >> REGRESSION: org.apache.zookeeper.test.ObserverTest.testObserver >> >> Error Message: >> KeeperErrorCode = ConnectionLoss for /obstest >> >> Stack Trace: >> org.apache.zookeeper.KeeperException$ConnectionLossException: >> KeeperErrorCode = ConnectionLoss for /obstest >> at >> org.apache.zookeeper.KeeperException.create(KeeperException.java:99) >> at >> org.apache.zookeeper.KeeperException.create(KeeperException.java:51) >> at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:761) >> at >> org.apache.zookeeper.test.ObserverTest.testObserver(ObserverTest.java:101) >> at >> org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52) >> >> >> REGRESSION: org.apache.zookeeper.test.ReadOnlyModeTest.testSeekForRwServer >> >> Error Message: >> KeeperErrorCode = ConnectionLoss for /test >> >> Stack Trace: >> org.apache.zookeeper.KeeperException$ConnectionLossException: >> KeeperErrorCode = ConnectionLoss for /test >> at >> org.apache.zookeeper.KeeperException.create(KeeperException.java:99) >> at >> org.apache.zookeeper.KeeperException.create(KeeperException.java:51) >> at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:761) >> at >> org.apache.zookeeper.test.ReadOnlyModeTest.testSeekForRwServer(ReadOnlyModeTest.java:213) >> at >> org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52) |
179 | No Perforce job exists for this issue. | 4 | 32707 | 5 years, 45 weeks ago | 0|i05ym7: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1124 | Multiop submitted to non-leader always fails due to timeout |
Bug | Closed | Critical | Fixed | Marshall McMullen | Marshall McMullen | Marshall McMullen | 13/Jul/11 12:17 | 23/Nov/11 14:22 | 15/Jul/11 00:51 | 3.4.0 | 3.4.0 | server | 0 | 1 | ZOOKEEPER-965 | all | The new Multiop support added under zookeeper-965 fails every single time if the multiop is submitted to a non-leader in quorum mode. In standalone mode it always works properly and this bug only presents itself in quorum mode (with 2 or more nodes). After 12 hours of debugging (*sigh*) it turns out to be a really simple fix. There are a couple of missing case statements inside FollowerRequestProcessor.java and ObserverRequestProcessor.java to ensure that multiop is forwarded to the leader for commit. I've attached a patch that fixes this problem. It's probably worth nothing that zookeeper-965 has already been committed to trunk. But this is a fatal flaw that will prevent multiop support from working properly and as such needs to get committed to 3.4.0 as well. Is there a way to tie these two cases together in some way? |
3990 | No Perforce job exists for this issue. | 1 | 32708 | 8 years, 36 weeks, 6 days ago |
Reviewed
|
0|i05ymf: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1123 | Can't connect to ZooKeeper server with the C client library from Solaris: connect() call fails. |
Bug | Open | Major | Unresolved | Unassigned | Tadeusz Andrzej Kadłubowski | Tadeusz Andrzej Kadłubowski | 12/Jul/11 07:03 | 08/Feb/12 14:28 | 3.3.3 | c client | 0 | 1 | Client: Solaris 5.10, x86 machine. Server: Linux Fedora 14 |
I have a C app that runs on Solaris and connects to ZooKeeper which I run on Linux (just a single server instance, that's just a development setup). Upon calling zookeeper_init() I get logs that say connect() call fails. TCP-wise the client sends RST packet instead of the third part of the three-way handshake. Traced client syscalls below. Sometimes the client is able to establish a connection - after half an hour of trying, or even longer. Logs ==== The client logs: 2011-07-11 16:20:22,954:13148(0xf):ZOO_ERROR@handle_socket_error_msg@1501: Socket [10.10.1.71:2181] zk retcode=-4, errno=0(Error 0): connect() call failed The server logs: 2011-07-11 16:20:22,950 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn$Factory@251] - Accepted socket connection from /10.10.9.27:34017 2011-07-11 16:20:22,955 - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@634] - EndOfStreamException: Unable to read additional data from client sessionid 0x0, likely client has closed socket 2011-07-11 16:20:22,955 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1435] - Closed socket connection for client /10.10.9.27:34017 (no session established for client) Syscalls in the client: /15: 3516.6191 so_socket(PF_INET, SOCK_STREAM, IPPROTO_IP, "", SOV_DEFAULT) = 11 /15: 3516.6192 setsockopt(11, tcp, TCP_NODELAY, 0xFD8A8ECC, 4, SOV_DEFAULT) = 0 /15: 3516.6193 fcntl(11, F_GETFL) = 2 /15: 3516.6194 fcntl(11, F_SETFL, FWRITE|FNONBLOCK) = 0 /15: 3516.6194 connect(11, 0x0813BA30, 16, SOV_DEFAULT) Err#150 EINPROGRESS /15: 3516.6195 write(2, " 2 0 1 1 - 0 7 - 1 2 1".., 23) = 23 <<< SNIP writing log message >>> /15: 3516.6204 write(2, "\n", 1) = 1 /15: 3516.6205 close(11) = 0 What does work: =============== Using Java client on the same Solaris machine works without any problems. Connecting to the Linux server using C client library on Linux works OK (though I tested it within one box, via loopback interface). |
2411 | No Perforce job exists for this issue. | 0 | 32709 | 8 years, 7 weeks, 1 day ago | 0|i05ymn: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1122 | "start" and "stop" commands are not present in zkServer.cmd |
Improvement | Open | Major | Unresolved | Unassigned | Alexander Osadchiy | Alexander Osadchiy | 11/Jul/11 08:17 | 14/Dec/19 06:08 | 3.3.3 | 3.7.0 | scripts | 4 | 10 | Windows | Now ZooKeeper server can be started and stoped from Unix-based systems using script "bin/zkServer.sh": bin/zkServer.sh start - to start server; bin/zkServer.sh stop - to stop server. There are no "start" and "stop" commands in script "zkServer.cmd" (for Windows). |
patch | 2412 | No Perforce job exists for this issue. | 2 | 42052 | 2 years, 39 weeks, 3 days ago | 0|i07k9z: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1121 | Data cleanup / Eviction policy |
Wish | Open | Major | Unresolved | Unassigned | Matthew | Matthew | 08/Jul/11 15:13 | 08/Jul/11 15:13 | server | 0 | 1 | We are using zookeeper to store versions of business objects in order to achieve coherence, distributed locks, etc. These business objects have limited lifespans (i.e. objects created over a week ago are rarely accessed), so effectively, after some time period, we do not need their information in zookeeper anymore. It would be nice to have a built-in tool or mechanism for expiring old data, much like how PurgeTxnLog cleans the snapshot and transaction log files. Any thoughts on whether this can be supported or how it can be accomplished? Currently we are walking the tree and deleting nodes with an old mtime. |
2413 | No Perforce job exists for this issue. | 0 | 42053 | 8 years, 37 weeks, 6 days ago | eviction policy | 0|i07ka7: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1120 | recipes haven't been built in distribution package |
Task | Resolved | Major | Duplicate | Unassigned | Yanming Zhou | Yanming Zhou | 07/Jul/11 20:49 | 14/Dec/12 21:09 | 05/Jan/12 12:22 | 3.3.3 | build, recipes | 0 | 1 | I have download zookeeper-3.3.3.tar.gz,and have not found zookeeper-recipes.jar in dist-maven,so I try to build it myself D:\packages\zookeeper-3.3.3\recipes\lock>ant Buildfile: D:\packages\zookeeper-3.3.3\recipes\lock\build.xml BUILD FAILED D:\packages\zookeeper-3.3.3\recipes\lock\build.xml:19: Cannot find D:\packages\zookeeper-3.3.3\recipes\build-recipes.xml imported from D:\packages\zookeeper-3.3.3\recipes\lock\build.xml Total time: 0 seconds recipes/build-recipes.xml doesn't include in zookeeper-3.3.3.tar.gz |
2414 | No Perforce job exists for this issue. | 1 | 33330 | 7 years, 14 weeks, 5 days ago | 0|i062gn: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1119 | zkServer stop command incorrectly reading comment lines in zoo.cfg |
Bug | Closed | Major | Fixed | Patrick D. Hunt | Glen Mazza | Glen Mazza | 07/Jul/11 06:33 | 23/Nov/11 14:22 | 25/Jul/11 18:22 | 3.3.3 | 3.4.0 | scripts | 0 | 1 | Ubuntu Linux 10.04, JDK 6 | Hello, adding the following commented-out dataDir to the zoo.cfg file (keeping the default one provided active): {noformat} # the directory where the snapshot is stored. # dataDir=test123/data dataDir=/export/crawlspace/mahadev/zookeeper/server1/data {noformat} and then running sh zkServer.sh stop is showing that the program is incorrectly reading the commented-out dataDir: {noformat} gmazza@gmazza-work:~/dataExt3/apps/zookeeper-3.3.3/bin$ sh zkServer.sh stop JMX enabled by default Using config: /media/NewDriveExt3_/apps/zookeeper-3.3.3/bin/../conf/zoo.cfg Stopping zookeeper ... error: could not find file test123/data /export/crawlspace/mahadev/zookeeper/server1/data/zookeeper_server.pid gmazza@gmazza-work:~/dataExt3/apps/zookeeper-3.3.3/bin$ {noformat} If I change the commented-out line in zoo.cfg to "test123456/data" and run the stop command again I get: error: could not find file test123456/data showing that it's incorrectly doing a run-time read of the commented-out lines. (Difficult to completely confirm, but this problem doesn't appear to occur with the start command, only the stop one.) |
3991 | No Perforce job exists for this issue. | 1 | 32710 | 8 years, 35 weeks, 2 days ago |
Reviewed
|
0|i05ymv: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1118 | Inconsistent data after server crashes several times |
Bug | Resolved | Critical | Duplicate | Unassigned | Kurt Young | Kurt Young | 05/Jul/11 22:56 | 06/Jul/11 21:11 | 06/Jul/11 09:32 | 3.3.2 | quorum | 0 | 0 | Redhat RHEL5 | I think there is a bug when Follower try to sync data with Leader. Assume there are some operations committed during one server had been crashed. When the server restart, it will receive a NEWLEADER packet which include the last zxid of leader and the server will set its own lastProcessZxid to the leader's. {code:title=Follower.java|borderStyle=solid} void followLeader() throws InterruptedException { fzk.registerJMX(new FollowerBean(this, zk), self.jmxLocalPeerBean); try { InetSocketAddress addr = findLeader(); try { connectToLeader(addr); long newLeaderZxid = registerWithLeader(Leader.FOLLOWERINFO); // get the last zxid from leader //check to see if the leader zxid is lower than ours //this should never happen but is just a safety check long lastLoggedZxid = self.getLastLoggedZxid(); if ((newLeaderZxid >> 32L) < (lastLoggedZxid >> 32L)) { LOG.fatal("Leader epoch " + Long.toHexString(newLeaderZxid >> 32L) + " is less than our epoch " + Long.toHexString(lastLoggedZxid >> 32L)); throw new IOException("Error: Epoch of leader is lower"); } syncWithLeader(newLeaderZxid); // set its own lastProcessZxid to leader's last zxid {code} Then, some COMMIT packets will be received by the server in order to sync the data with leader. And then, the leader will send an UPTODATE packet to server to take a snapshot. {code:title=Follower.java|borderStyle=solid} protected void processPacket(QuorumPacket qp) throws IOException{ switch (qp.getType()) { case Leader.PING: ping(qp); break; case Leader.PROPOSAL: TxnHeader hdr = new TxnHeader(); BinaryInputArchive ia = BinaryInputArchive .getArchive(new ByteArrayInputStream(qp.getData())); Record txn = SerializeUtils.deserializeTxn(ia, hdr); if (hdr.getZxid() != lastQueued + 1) { LOG.warn("Got zxid 0x" + Long.toHexString(hdr.getZxid()) + " expected 0x" + Long.toHexString(lastQueued + 1)); } lastQueued = hdr.getZxid(); fzk.logRequest(hdr, txn); break; case Leader.COMMIT: fzk.commit(qp.getZxid()); break; case Leader.UPTODATE: fzk.takeSnapshot(); self.cnxnFactory.setZooKeeperServer(fzk); break; case Leader.REVALIDATE: revalidate(qp); break; case Leader.SYNC: fzk.sync(); break; } } {code} Notice the different way the Follower treat the COMMIT and the UPTODATE packets. When receives a COMMIT packet, the follower will give this to a processor to deal with. But if receives a UPTODATE packet, the follower will take a snapshot immediately. So it is possible that the server will take snapshot before it commits all the operations it missed. Then if the server crashed again and recovered, it will recover its data from the snapshot, so the date inconsistent with the leader now, but its last zxid is the same. |
3992 | No Perforce job exists for this issue. | 0 | 32711 | 8 years, 38 weeks ago | 0|i05yn3: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1117 | zookeeper 3.3.3 fails to build with gcc >= 4.6.1 on Debian/Ubuntu |
Bug | Closed | Minor | Fixed | James Page | James Page | James Page | 05/Jul/11 10:50 | 23/Nov/11 14:22 | 26/Aug/11 03:52 | 3.3.3, 3.4.0 | 3.3.4, 3.4.0 | c client | 0 | 2 | Ubuntu Developement Release (11.10/Oneiric Ocelot), Debian Unstable (sid) | zookeeper 3.3.3 (and 3.3.1) fails to build on Debian and Ubuntu systems with gcc >= 4.6.1: /bin/bash ./libtool --tag=CC --mode=compile gcc -DHAVE_CONFIG_H -I. -I./include -I./tests -I./generated -Wall -Werror -g -O2 -D_GNU_SOURCE -MT zookeeper.lo -MD -MP -MF .deps/zookeeper.Tpo -c -o zookeeper.lo `test -f 'src/zookeeper.c' || echo './'`src/zookeeper.c libtool: compile: gcc -DHAVE_CONFIG_H -I. -I./include -I./tests -I./generated -Wall -Werror -g -O2 -D_GNU_SOURCE -MT zookeeper.lo -MD -MP -MF .deps/zookeeper.Tpo -c src/zookeeper.c -fPIC -DPIC -o .libs/zookeeper.o src/zookeeper.c: In function 'getaddrs': src/zookeeper.c:455:13: error: variable 'port' set but not used [-Werror=unused-but-set-variable] cc1: all warnings being treated as errors See http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=625441 for more information. |
3993 | No Perforce job exists for this issue. | 7 | 32712 | 8 years, 30 weeks, 6 days ago |
Reviewed
|
0|i05ynb: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1116 | Add MX4J Support to Zookeeper |
Improvement | Open | Minor | Unresolved | Unassigned | Erez Mazor | Erez Mazor | 03/Jul/11 03:43 | 03/Jul/11 03:43 | 3.3.4 | server | 0 | 0 | It would be great to add MX4J support for Zookeeper, if possible it can be inspired by the Cassandra way for loading mx4j (which only starts if the mx4j jar is in the classpath, see https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/utils/Mx4jTool.java) |
2415 | No Perforce job exists for this issue. | 0 | 42054 | 8 years, 38 weeks, 4 days ago | 0|i07kaf: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1115 | follower can not sync with leader |
Bug | Resolved | Critical | Not A Problem | Unassigned | helei | helei | 01/Jul/11 03:20 | 21/Oct/13 23:06 | 10/Oct/13 16:38 | 3.3.0, 3.3.3 | quorum | 0 | 6 | ZOOKEEPER-1548 | linux rhel 4, x64, java version 1.4.2 | exception causing shutdownthere are 5 members in the quorum. one follower can not sync with leader after restart. it seems leader has close the data connection with this follower because of read timeout. here is the key log in follower: {noformat} 2011-06-30 22:14:45,069 - WARN [Thread-17:QuorumCnxManager$RecvWorker@658] - Connection broken: java.nio.channels.ClosedChannelException at sun.nio.ch.SocketChannelImpl.ensureReadOpen(SocketChannelImpl.java:113) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:156) at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:629) 2011-06-30 22:14:45,069 - INFO [QuorumPeer:/0.0.0.0:2181:FastLeaderElection@689] - Notification: 3, 17198470148, 3, 3, LOOKING, LOOKING, 3 2011-06-30 22:14:45,070 - ERROR [Thread-16:QuorumCnxManager$SendWorker@559] - Failed to send last message. Shutting down thread. java.nio.channels.ClosedChannelException at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:126) at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:324) at org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.send(QuorumCnxManager.java:548) at org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:557) 2011-06-30 22:14:45,082 - INFO [QuorumPeer:/0.0.0.0:2181:Learner@282] - Getting a diff from the leader 0x4011bd462 2011-06-30 22:14:45,083 - WARN [Thread-18:QuorumCnxManager$SendWorker@589] - Send worker leaving thread 2011-06-30 22:14:45,085 - WARN [QuorumPeer:/0.0.0.0:2181:Follower@116] - Got zxid 0x4011bd405 expected 0x1 2011-06-30 22:14:45,090 - INFO [QuorumPeer:/0.0.0.0:2181:FileTxnSnapLog@208] - Snapshotting: 4011bd462 2011-06-30 22:14:53,397 - WARN [SyncThread:3:SendAckRequestProcessor@63] - Closing connection to leader, exception during packet send java.net.SocketException: Broken pipe at java.net.SocketOutputStream.socketWrite0(Native Method) at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92) at java.net.SocketOutputStream.write(SocketOutputStream.java:136) at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123) at org.apache.zookeeper.server.quorum.Learner.writePacket(Learner.java:126) at org.apache.zookeeper.server.quorum.SendAckRequestProcessor.flush(SendAckRequestProcessor.java:61) at org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:164) at org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:98) 2011-06-30 22:14:53,398 - WARN [QuorumPeer:/0.0.0.0:2181:Follower@82] - Exception when following the leader java.net.SocketException: Socket closed at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:99) at java.net.SocketOutputStream.write(SocketOutputStream.java:136) at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123) at org.apache.zookeeper.server.quorum.Learner.writePacket(Learner.java:126) at org.apache.zookeeper.server.quorum.Learner.ping(Learner.java:358) at org.apache.zookeeper.server.quorum.Follower.processPacket(Follower.java:108) at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:79) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:634) 2011-06-30 22:14:53,398 - WARN [SyncThread:3:SendAckRequestProcessor@63] - Closing connection to leader, exception during packet send java.net.SocketException: Socket closed at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:99) at java.net.SocketOutputStream.write(SocketOutputStream.java:136) at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123) at org.apache.zookeeper.server.quorum.Learner.writePacket(Learner.java:126) at org.apache.zookeeper.server.quorum.SendAckRequestProcessor.flush(SendAckRequestProcessor.java:61) at org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:164) at org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:98) 2011-06-30 22:14:53,399 - INFO [QuorumPeer:/0.0.0.0:2181:Follower@166] - shutdown called java.lang.Exception: shutdown Follower at org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:638) and these are the leader's: 2011-06-30 22:14:35,943 - ERROR [LearnerHandler-/10.23.247.163:14975:LearnerHandler@444] - Unexpected exception causing shutdown while sock still open java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:129) at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) at java.io.BufferedInputStream.read(BufferedInputStream.java:237) at java.io.DataInputStream.readInt(DataInputStream.java:370) at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:84) at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108) at org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:358) 2011-06-30 22:14:35,943 - WARN [LearnerHandler-/10.23.247.163:14975:LearnerHandler@457] - ******* GOODBYE /10.23.247.163:14975 ******** 2011-06-30 22:14:48,943 - ERROR [CommitProcessor:4:NIOServerCnxn@422] - Unexpected Exception: java.nio.channels.CancelledKeyException at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:55) at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:59) at org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:395) at org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1360) at org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:367) at org.apache.zookeeper.server.quorum.Leader$ToBeAppliedRequestProcessor.processRequest(Leader.java:535) at org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:73) 2011-06-30 22:14:49,084 - ERROR [LearnerHandler-/10.23.247.163:14998:LearnerHandler@444] - Unexpected exception causing shutdown while sock still open java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:129) at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) at java.io.BufferedInputStream.read(BufferedInputStream.java:237) at java.io.DataInputStream.readInt(DataInputStream.java:370) at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:84) at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108) at org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:358) 2011-06-30 22:14:49,084 - WARN [LearnerHandler-/10.23.247.163:14998:LearnerHandler@457] - ******* GOODBYE /10.23.247.163:14998 ******** {noformat} |
2416 | No Perforce job exists for this issue. | 0 | 32713 | 6 years, 24 weeks ago | 0|i05ynj: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1114 | Concurrent primitives library - barrier |
Improvement | Open | Trivial | Unresolved | Unassigned | Chia-Hung Lin | Chia-Hung Lin | 01/Jul/11 00:22 | 01/Jul/11 00:23 | recipes | 0 | 1 | GNU/ Debian java 1.6.0_21 zookeeper trunk (svn info shows Revision: 1141788) |
The patch is provide according to wiki[1]. The source follows the description at tutorial[2]. However, from the mailing list it shows that this version is not optimized[3]. So is there any chance anyone can point out which algorithm may provide better result for this construct? I am happy to work on it, though it may take some time. [1]. http://wiki.apache.org/hadoop/ZooKeeper/SoC2010Ideas#Concurrent_Primitives_Library [2]. http://zookeeper.apache.org/doc/current/zookeeperTutorial.html#sc_barriers [3]. http://mail-archives.apache.org/mod_mbox/zookeeper-user/201102.mbox/%3C87184214-59D4-4D64-A884-A6F07CE0F239@yahoo-inc.com%3E |
gsoc2010 | 2417 | No Perforce job exists for this issue. | 1 | 42055 | 8 years, 38 weeks, 6 days ago | 0|i07kan: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1113 | ZOOKEEPER-107 QuorumMaj counts the number of ACKs but does not check who sent the ACK |
Sub-task | Resolved | Minor | Fixed | Alexander Shraer | Alexander Shraer | Alexander Shraer | 30/Jun/11 19:00 | 07/Mar/13 01:46 | 07/Mar/13 01:46 | 3.5.0 | quorum | 0 | 3 | ZOOKEEPER-107 | If a server connects to the leader as follower, it will be allowed to vote (with QuorumMaj) even if it is not a follower in the current configuration, as the leader does not care who sends the ACK - it only counts the number of ACKs. |
2418 | No Perforce job exists for this issue. | 0 | 42056 | 7 years, 3 weeks ago | 0|i07kav: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1112 | Add support for C client for SASL authentication |
New Feature | Resolved | Major | Fixed | Damien Diederen | Eugene Joseph Koontz | Eugene Joseph Koontz | 30/Jun/11 18:13 | 22/Jan/20 10:10 | 22/Jan/20 06:55 | 3.7.0 | 2 | 12 | 0 | 18000 | ZOOKEEPER-938 | Hopefully this would leverage the SASL server-side support provided by ZOOKEEPER-938. It would be similar to the Java SASL client support also provided in ZOOKEEPER-938. Java has built-in SASL support, but I'm not sure what C libraries are available for SASL and if so, are they compatible with the Apache license. |
100% | 100% | 18000 | 0 | pull-request-available | 2419 | No Perforce job exists for this issue. | 7 | 42057 | 31 weeks, 3 days ago | 0|i07kb3: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1111 | JMXEnv uses System.err instead of logging |
Bug | Closed | Major | Fixed | Ivan Kelly | Ivan Kelly | Ivan Kelly | 30/Jun/11 04:51 | 23/Nov/11 14:22 | 19/Jul/11 17:39 | 3.4.0 | 0 | 1 | BOOKKEEPER-30 | As stated in the title, org.apache.zookeeper.test.JMXEnv uses System.err.println to output traces. This makes for a lot of noise on the console when you run the tests. It has a logging object already, so it should use that instead. | 3994 | No Perforce job exists for this issue. | 1 | 32714 | 8 years, 36 weeks, 1 day ago |
Reviewed
|
0|i05ynr: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1110 | c interface zookeeper_close close fd too quickly. |
Bug | Resolved | Major | Invalid | Unassigned | xiliu | xiliu | 29/Jun/11 01:30 | 29/Jul/12 01:37 | 25/Apr/12 03:55 | 3.3.3 | c client | 1 | 1 | 1800 | 1800 | 0% | linux platform. | The correct step about close client is the client send CLOSE_OP to the server, wait for several seconds, the server will process the terminal request and close the fd. But the zookeeper_close interface is wrong, because the adaptor_send_queue(zh, 3000) (line 2332), will first wait the timeout then send the request. The right order is first send the request then wait the timeout. I change as follow: $svn diff src/c/src/zookeeper.c Index: src/c/src/zookeeper.c =================================================================== --- src/c/src/zookeeper.c (revision 1140451) +++ src/c/src/zookeeper.c (working copy) @@ -2329,7 +2329,8 @@ /* make sure the close request is sent; we set timeout to an arbitrary * (but reasonable) number of milliseconds since we want the call to block*/ - rc=adaptor_send_queue(zh, 3000); + rc=adaptor_send_queue(zh, 0); + sleep(3); }else{ LOG_INFO(("Freeing zookeeper resources for sessionId=%#llx\n", zh->client_id.client_id)); |
0% | 0% | 1800 | 1800 | 2420 | No Perforce job exists for this issue. | 0 | 32715 | 7 years, 48 weeks, 1 day ago | 0|i05ynz: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1109 | Zookeeper service is down when SyncRequestProcessor meets any exception. |
Bug | Closed | Critical | Fixed | Laxman | Laxman | Laxman | 24/Jun/11 00:48 | 23/Nov/11 14:22 | 25/Jul/11 17:01 | 3.3.0, 3.3.1, 3.3.2, 3.3.3 | 3.4.0 | quorum | 0 | 4 | 259200 | 259200 | 0% | ZOOKEEPER-121 | *Problem* Zookeeper is not shut down completely when dataDir disk space is full and ZK Cluster went into unserviceable state. *Scenario* If the leader zookeeper disk is made full, the zookeeper is trying to shutdown. But it is waiting indefinitely while shutting down the SyncRequestProcessor thread. *Root Cause* this.join() is invoked in the same thread where System.exit(11) has been triggered. When disk space full happens, It got the exception as follows 'No space left on device' and invoked System.exit(11) from the SyncRequestProcessor thread(The following logs shows the same). Before exiting JVM, ZK will execute the ShutdownHook of QuorumPeerMain and the flow comes to SyncRequestProcessor.shutdown(). Here this.join() is invoked in the same thread where System.exit(11) has been invoked. |
0% | 0% | 259200 | 259200 | 3995 | No Perforce job exists for this issue. | 2 | 32716 | 8 years, 35 weeks, 2 days ago |
Reviewed
|
quorum, leader, disk full, shutdown | 0|i05yo7: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1108 | Various bugs in zoo_add_auth in C |
Bug | Closed | Blocker | Fixed | Dheeraj Agrawal | Dheeraj Agrawal | Dheeraj Agrawal | 23/Jun/11 17:02 | 23/Nov/11 14:22 | 08/Sep/11 22:27 | 3.3.3 | 3.4.0 | c client | 0 | 6 | 3 issues: In zoo_add_auth: there is a race condition: 2940 // [ZOOKEEPER-800] zoo_add_auth should return ZINVALIDSTATE if 2941 // the connection is closed. 2942 if (zoo_state(zh) == 0) { 2943 return ZINVALIDSTATE; 2944 } when we do zookeeper_init, the state is initialized to 0 and above we check if state = 0 then throw exception. There is a race condition where the doIo thread is slow and has not changed the state to CONNECTING, then you end up returning back ZKINVALIDSTATE. The problem is we use 0 for CLOSED state and UNINITIALIZED state. in case of uninitialized case it should let it go through. 2nd issue: Another Bug: in send_auth_info, the check is not correct while (auth->next != NULL) { //--BUG: in cases where there is only one auth in the list, this will never send that auth, as its next will be NULL rc = send_info_packet(zh, auth); auth = auth->next; } FIX IS: do { rc = send_info_packet(zh, auth); auth = auth->next; } while (auth != NULL); //this will make sure that even if there is one auth ,that will get sent. 3rd issue: 2965 add_last_auth(&zh->auth_h, authinfo); 2966 zoo_unlock_auth(zh); 2967 2968 if(zh->state == ZOO_CONNECTED_STATE || zh->state == ZOO_ASSOCIATING_STATE) 2969 return send_last_auth_info(zh); if it is connected, we only send the last_auth_info, which may be different than the one we added, as we unlocked it before sending it. |
3996 | No Perforce job exists for this issue. | 5 | 32717 | 8 years, 28 weeks, 6 days ago |
Reviewed
|
0|i05yof: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1107 | automating log and snapshot cleaning |
New Feature | Closed | Major | Fixed | Laxman | Jun Rao | Jun Rao | 23/Jun/11 10:48 | 23/Nov/11 14:22 | 02/Sep/11 16:51 | 3.3.3 | 3.4.0 | server | 3 | 4 | ZOOKEEPER-323 | I like to have ZK itself manage the amount of snapshots and logs kept, instead of relying on the PurgeTxnLog utility. |
3997 | No Perforce job exists for this issue. | 8 | 33331 | 8 years, 29 weeks, 5 days ago |
Reviewed
|
0|i062gv: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1106 | mt c client core when create node |
Bug | Open | Major | Unresolved | zhang yafei | jiang guangran | jiang guangran | 23/Jun/11 02:15 | 18/Mar/16 13:36 | 3.3.2 | c client | 0 | 0 | in deserialize_CreateResponse rc = rc ? : in->deserialize_String(in, "path", &v->path); in deserialize_String len = -1 so v->path is uninitialised, and free, so core do_io thread #0 0x00000039fb030265 in raise () from /lib64/libc.so.6 #1 0x00000039fb031d10 in abort () from /lib64/libc.so.6 #2 0x00000039fb06a84b in __libc_message () from /lib64/libc.so.6 #3 0x00000039fb0722ef in _int_free () from /lib64/libc.so.6 #4 0x00000039fb07273b in free () from /lib64/libc.so.6 #5 0x00002b0afd755dd1 in deallocate_String (s=0x5a490f40) at src/recordio.c:29 #6 0x00002b0afd754ade in zookeeper_process (zh=0x131e3870, events=<value optimized out>) at src/zookeeper.c:2071 #7 0x00002b0afd75b2ef in do_io (v=<value optimized out>) at src/mt_adaptor.c:310 #8 0x00000039fb8064a7 in start_thread () from /lib64/libpthread.so.0 #9 0x00000039fb0d3c2d in clone () from /lib64/libc.so.6 create_node thread #0 0x00000039fb80ab99 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00002b0afd75af5c in wait_sync_completion (sc=0x131e4c90) at src/mt_adaptor.c:82 #2 0x00002b0afd751750 in zoo_create (zh=0x131e3870, path=0x13206fa8 "/jsq/zr2/hb/10.250.8.139:8102", value=0x131e86a8 "\n\021\061\060.250.8.139:8102\022\035/home/shaoqiang/workdir2/qrs/\030\001 \001*%\n\020\n", valuelen=102, acl=0x2b0afd961700, flags=1, path_buffer=0x0, path_buffer_len=0) at src/zookeeper.c:3028 |
2421 | No Perforce job exists for this issue. | 1 | 32718 | 4 years, 6 days ago | 0|i05yon: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1105 | c client zookeeper_close not send CLOSE_OP request to server |
Bug | Closed | Major | Fixed | Mate Szalay-Beko | jiang guangran | jiang guangran | 23/Jun/11 02:05 | 14/Feb/20 10:23 | 05/Feb/20 03:33 | 3.3.2, 3.4.3 | 3.6.0, 3.5.7, 3.7.0 | c client | 5 | 15 | 0 | 12600 | ZOOKEEPER-3645 | in zookeeper_close function, do adaptor_finish before send CLOSE_OP request to server so the CLOSE_OP request can not be sent to server in server zookeeper.log have many 2011-06-22 00:23:02,323 - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@634] - EndOfStreamException: Unable to read additional data from client sessionid 0x1305970d66d2224, likely client has closed socket 2011-06-22 00:23:02,324 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1435] - Closed socket connection for client /10.250.8.123:60257 which had sessionid 0x1305970d66d2224 2011-06-22 00:23:02,325 - ERROR [CommitProcessor:1:NIOServerCnxn@445] - Unexpected Exception: java.nio.channels.CancelledKeyException at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:55) at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:59) at org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:418) at org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1509) at org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:367) at org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:73) and java client not have this problem |
100% | 100% | 12600 | 0 | pull-request-available | 2422 | No Perforce job exists for this issue. | 5 | 2377 | 7 weeks, 2 days ago | 0|i00rfr: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1104 | CLONE - In QuorumTest, use the same "for ( .. try { break } catch { } )" pattern in testFollowersStartAfterLeaders as in testSessionMove. |
Improvement | Closed | Minor | Fixed | Eugene Joseph Koontz | sreekanth | sreekanth | 22/Jun/11 14:21 | 23/Nov/11 14:22 | 14/Aug/11 20:54 | 3.4.0 | 3.4.0 | tests | 0 | 0 | ZOOKEEPER-1103 | Patrick Hunt writes: "Such uses of sleep [used in testFollowersStartAfterLeader] are just asking for trouble. Take a look at the use of sleep in testSessionMove in the same class for a better way to do this. I had gone through all the tests a while back, replacing all the "sleep(x)" with something like this testSessionMove pattern (retry with a max limit that's very long). During reviews we should look for anti-patterns like this and address them before commit." So, modify testFollowersStartAfterLeaders to use the same retrying approach that testSessionMove uses. |
47478 | No Perforce job exists for this issue. | 3 | 33332 | 8 years, 32 weeks, 3 days ago | Contains improvement to original patch (remove unneeded boolean variable). |
Reviewed
|
0|i062h3: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1103 | In QuorumTest, use the same "for ( .. try { break } catch { } )" pattern in testFollowersStartAfterLeaders as in testSessionMove. |
Improvement | Closed | Minor | Fixed | Eugene Joseph Koontz | Eugene Joseph Koontz | Eugene Joseph Koontz | 21/Jun/11 17:23 | 23/Nov/11 14:22 | 22/Jun/11 15:47 | 3.3.3, 3.4.0 | 3.3.4, 3.4.0 | tests | 0 | 0 | ZOOKEEPER-1104 | Patrick Hunt writes: "Such uses of sleep [used in testFollowersStartAfterLeader] are just asking for trouble. Take a look at the use of sleep in testSessionMove in the same class for a better way to do this. I had gone through all the tests a while back, replacing all the "sleep(x)" with something like this testSessionMove pattern (retry with a max limit that's very long). During reviews we should look for anti-patterns like this and address them before commit." So, modify testFollowersStartAfterLeaders to use the same retrying approach that testSessionMove uses. |
47479 | No Perforce job exists for this issue. | 4 | 33333 | 8 years, 40 weeks, 1 day ago |
Reviewed
|
0|i062hb: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1102 | Need update for programmer manual to cover multi operation |
Bug | Open | Major | Unresolved | Unassigned | Ted Dunning | Ted Dunning | 21/Jun/11 13:53 | 08/Aug/11 14:05 | 0 | 0 | The new multi operation is undocumented as yet. Clearly it needs some doc to cover: 1) the basic syntax 2) java code sample 3) C code sample |
2423 | No Perforce job exists for this issue. | 0 | 32719 | 8 years, 40 weeks, 2 days ago | 0|i05yov: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1101 | Upload zookeeper-test maven artifacts to maven repository. |
Bug | Closed | Major | Fixed | Patrick D. Hunt | Ivan Kelly | Ivan Kelly | 21/Jun/11 13:15 | 23/Nov/11 14:22 | 01/Aug/11 14:31 | 3.4.0 | 0 | 1 | These are generated by ant package since ZOOKEEPER-1042, they just need to be pushed to a maven repo. Bookkeeper requires this package to build. | 47480 | No Perforce job exists for this issue. | 0 | 32720 | 8 years, 34 weeks, 3 days ago |
Reviewed
|
0|i05yp3: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1100 | Killed (or missing) SendThread will cause hanging threads |
Bug | Resolved | Major | Fixed | Camille Fournier | Gunnar Wagenknecht | Gunnar Wagenknecht | 21/Jun/11 05:24 | 02/Mar/16 20:37 | 26/Dec/11 10:56 | 3.3.3 | 3.5.0 | java client | 0 | 5 | ZOOKEEPER-1186 | http://mail-archives.apache.org/mod_mbox/zookeeper-user/201106.mbox/%3Citpgb6$2mi$1@dough.gmane.org%3E | After investigating an issues with [hanging threads|http://mail-archives.apache.org/mod_mbox/zookeeper-user/201106.mbox/%3Citpgb6$2mi$1@dough.gmane.org%3E] I noticed that any java.lang.Error might silently kill the SendThread. Without a SendThread any thread that wants to send something will hang forever. Currently nobody will recognize a SendThread that died. I think at least a state should be flipped (or flag should be set) that causes all further send attempts to fail or to re-spin the connection loop. |
2424 | No Perforce job exists for this issue. | 2 | 32721 | 8 years, 1 week, 1 day ago |
Incompatible change
|
0|i05ypb: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1099 | Add simple examples to show the usage of zookeeper |
Improvement | Open | Minor | Unresolved | Unassigned | divya | divya | 20/Jun/11 14:48 | 20/Jun/11 14:52 | java client | 0 | 0 | We used zookeeper to make one of our service highly available. I have written a sample program which shows the usage of zookeeper to make the required service highly available . Please review the client code attached . | 2425 | No Perforce job exists for this issue. | 1 | 42058 | 8 years, 40 weeks, 3 days ago | 0|i07kbb: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1098 | Upload native libraries as Maven artifacts |
New Feature | Resolved | Minor | Duplicate | Unassigned | Joey Echeverria | Joey Echeverria | 20/Jun/11 09:34 | 16/Jul/14 16:43 | 23/Apr/14 18:21 | 3.5.0 | 0 | 5 | HBase is planning to make use of the native ZooKeeper libraries in order to have small session timeouts that aren't affected by GC pauses (see HBASE-1316). The current patch uses a custom maven packaging of the ZooKeeper native libraries. It would be nice if ZooKeeper published those artifacts as part of its release process. | 2426 | No Perforce job exists for this issue. | 0 | 42059 | 5 years, 36 weeks, 1 day ago | 0|i07kbj: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1097 | Quota is not correctly rehydrated on snapshot reload |
Bug | Closed | Blocker | Fixed | Camille Fournier | Camille Fournier | Camille Fournier | 16/Jun/11 10:07 | 23/Nov/11 14:22 | 26/Jun/11 19:30 | 3.3.3, 3.4.0 | 3.3.4, 3.4.0 | server | 0 | 1 | traverseNode in DataTree will never actually traverse the limit nodes properly. | 47481 | No Perforce job exists for this issue. | 7 | 32722 | 8 years, 39 weeks, 4 days ago |
Reviewed
|
0|i05ypj: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1096 | Leader communication should listen on specified IP, not wildcard address |
Improvement | Closed | Minor | Fixed | Germán Blanco | Jared Cantwell | Jared Cantwell | 15/Jun/11 14:59 | 13/Mar/14 14:16 | 25/Sep/13 18:14 | 3.3.3, 3.4.0 | 3.4.6, 3.5.0 | server | 4 | 7 | ZOOKEEPER-1711 | Server should specify the local address that is used for leader communication and leader election (and not use the default of listening on all interfaces). This is similar to the clientPortAddress parameter that was added a year ago. After reviewing the code, we can't think of a reason why only the port would be used with the wildcard interface, when servers are already connecting specifically to that interface anyway. I have submitted a patch, but it does not account for all leader election algorithms. Probably should have an option to toggle this, for backwards compatibility, although it seems like it would be a bug if this change broke things. There is some more information about making it an option here: http://mail-archives.apache.org/mod_mbox/hadoop-zookeeper-dev/201008.mbox/%3CAANLkTikkT97Djqt3CU=H2+7Gnj_4p28hgCXjh345HiyN@mail.gmail.com%3E |
33 | No Perforce job exists for this issue. | 8 | 42060 | 6 years, 2 weeks ago | 0|i07kbr: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1095 | Simple leader election recipe |
Improvement | Closed | Major | Fixed | Eric Sammer | Henry Robinson | Henry Robinson | 15/Jun/11 13:04 | 17/May/14 08:03 | 07/Jul/11 18:55 | 3.3.3 | 3.4.0 | 2 | 5 | HDFS-1973, ZOOKEEPER-1080 | Leader election recipe originally contributed to ZOOKEEPER-1080. | 47482 | No Perforce job exists for this issue. | 2 | 33334 | 8 years, 37 weeks, 6 days ago | Adds an implementation of the leader election recipe |
Reviewed
|
0|i062hj: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1094 | Small improvements to LeaderElection and Vote classes |
Improvement | Closed | Minor | Fixed | Henry Robinson | Henry Robinson | Henry Robinson | 13/Jun/11 18:14 | 23/Nov/11 14:22 | 16/Jun/11 19:34 | 3.4.0 | quorum | 0 | 0 | 1. o.a.z.q.Vote is a struct-style class, whose fields are public and not final. In general, we should prefer making the fields of these kind of classes final, and hiding them behind getters for the following reasons: * Marking them as final allows clients of the class not to worry about any synchronisation when accessing the fields * Hiding them behind getters allows us to change the implementation of the class without changing the API. Object creation is very cheap. It's ok to create new Votes rather than mutate existing ones. 2. Votes are mainly used in the LeaderElection class. In this class a map of addresses to votes is passed in to countVotes, which modifies the map contents inside an iterator (and therefore changes the object passed in by reference). This is pretty gross, so at the same time I've slightly refactored this method to return information about the number of validVotes in the ElectionResult class, which is returned by countVotes. 3. The previous implementation of countVotes was quadratic in the number of votes. It is possible to do this linearly. No real speed-up is expected as a result, but it salves the CS OCD in me :) |
47483 | No Perforce job exists for this issue. | 2 | 33335 | 8 years, 41 weeks ago |
Reviewed
|
0|i062hr: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1093 | ZooKeeper quotas will always trigger if set on one criteria but not the other |
Bug | Resolved | Major | Duplicate | Camille Fournier | Camille Fournier | Camille Fournier | 13/Jun/11 16:05 | 19/Jul/11 19:55 | 19/Jul/11 19:55 | 3.3.3, 3.4.0 | server | 0 | 0 | /testing has quota on bytes but not node count. Count quota will always fire because it is set to -1 and will always fail comparison. 2011-06-13 16:01:53,492 - WARN [CommitProcessor:3:DataTree@373] - Quota exceeded: /testing count=4 limit=-1 |
67451 | No Perforce job exists for this issue. | 1 | 32723 | 8 years, 36 weeks, 2 days ago | 0|i05ypr: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1092 | get rid of pending changes |
Improvement | Open | Minor | Unresolved | Unassigned | Benjamin Reed | Benjamin Reed | 11/Jun/11 21:49 | 04/Nov/11 12:20 | 0 | 1 | ZOOKEEPER-1285 | pending changes used by PrepRequestProcessor and FinalRequestProcessor is complicated and requires synchronization between threads. | 2427 | No Perforce job exists for this issue. | 0 | 42061 | 8 years, 20 weeks, 6 days ago | 0|i07kbz: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1091 | when the chrootPath of ClientCnxn is not null and the Watches of zooKeeper is not null and the method primeConnection(SelectionKey k) of ClientCnxn Occurred again for some reason ,then the wrong watcher clientPath is sended to server |
Bug | Closed | Critical | Duplicate | Unassigned | zhangyouming | zhangyouming | 09/Jun/11 23:21 | 23/Nov/11 14:22 | 16/Oct/11 14:21 | 3.3.3 | 3.4.0 | java client | 0 | 3 | 3600 | 3600 | 0% | Linux version 2.6.18-194.el5 (mockbuild@builder10.centos.org) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-48)) #1 SMP Fri Apr 2 14:58:14 EDT 2010 | if the chrootPath of ClientCnxn is not null and the Watches of zooKeeper is not null; and then for some reason(like zookeeper server stop and start), the zookeeper client will primeConnection to server again and tell server the watcher path,but the path is wrong,it show be serverpath but not clientpath;if the wrong watcher clientPath is sended to server, the exception will occurr, the exceptions: 2011-06-10 04:33:16,935 [pool-2-thread-30-SendThread(DB1-6:2181)] WARN org.apache.zookeeper.ClientCnxn - Session 0x5302c4403a30232 for server DB1-6/192.168.1.6:2181, unexpected error, closing socket connection and attempting reconnect java.lang.StringIndexOutOfBoundsException: String index out of range: -6 at java.lang.String.substring(String.java:1937) at java.lang.String.substring(String.java:1904) at org.apache.zookeeper.ClientCnxn$SendThread.readResponse(ClientCnxn.java:794) at org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:881) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1130) |
0% | 0% | 3600 | 3600 | 2428 | No Perforce job exists for this issue. | 0 | 32724 | 8 years, 26 weeks, 6 days ago | 0|i05ypz: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1090 | Race condition while taking snapshot can lead to not restoring data tree correctly |
Bug | Closed | Critical | Fixed | Vishal Kher | Vishal Kher | Vishal Kher | 09/Jun/11 10:24 | 23/Nov/11 14:22 | 28/Jul/11 01:50 | 3.3.3 | 3.4.0 | server | 0 | 4 | I think I have found a bug in the snapshot mechanism. The problem occurs because dt.lastProcessedZxid is not synchronized (or rather set before the data tree is modified): FileTxnSnapLog: {code} public void save(DataTree dataTree, ConcurrentHashMap<Long, Integer> sessionsWithTimeouts) throws IOException { long lastZxid = dataTree.lastProcessedZxid; LOG.info("Snapshotting: " + Long.toHexString(lastZxid)); File snapshot=new File( snapDir, Util.makeSnapshotName(lastZxid)); snapLog.serialize(dataTree, sessionsWithTimeouts, snapshot); <=== the Datatree may not have the modification for lastProcessedZxid } {code} DataTree: {code} public ProcessTxnResult processTxn(TxnHeader header, Record txn) { ProcessTxnResult rc = new ProcessTxnResult(); String debug = ""; try { rc.clientId = header.getClientId(); rc.cxid = header.getCxid(); rc.zxid = header.getZxid(); rc.type = header.getType(); rc.err = 0; if (rc.zxid > lastProcessedZxid) { lastProcessedZxid = rc.zxid; } [...modify data tree...] } {code} The lastProcessedZxid must be set after the modification is done. As a result, if server crashes after taking the snapshot (and the snapshot does not contain change corresponding to lastProcessedZxid) restore will not restore the data tree correctly: {code} public long restore(DataTree dt, Map<Long, Integer> sessions, PlayBackListener listener) throws IOException { snapLog.deserialize(dt, sessions); FileTxnLog txnLog = new FileTxnLog(dataDir); TxnIterator itr = txnLog.read(dt.lastProcessedZxid+1); <=== Assumes lastProcessedZxid is deserialized } {code} I have had offline discussion with Ben and Camille on this. I will be posting the discussion shortly. |
persistence, server, snapshot | 47484 | No Perforce job exists for this issue. | 1 | 32725 | 8 years, 33 weeks, 1 day ago |
Reviewed
|
0|i05yq7: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1089 | zkServer.sh status does not work due to invalid option of nc |
Bug | Resolved | Major | Fixed | Roman Shaposhnik | William Au | William Au | 09/Jun/11 09:57 | 28/Dec/11 05:58 | 28/Dec/11 01:08 | 3.3.4, 3.4.0 | 3.4.3, 3.3.5, 3.5.0 | scripts | 0 | 4 | The nc command used by zkServer.sh does not have the "-q" option on some linux versions ( I have checked RedHat/Fedora and FreeBSD). | 2429 | No Perforce job exists for this issue. | 2 | 32726 | 8 years, 13 weeks, 1 day ago |
Reviewed
|
0|i05yqf: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1088 | delQuota does not remove the quota node and subesquent setquota calls for that path will fail |
Bug | Resolved | Major | Won't Fix | Camille Fournier | Camille Fournier | Camille Fournier | 08/Jun/11 16:04 | 17/Nov/11 01:05 | 13/Jun/11 16:04 | 3.3.3 | server | 0 | 1 | 86400 | 86400 | 0% | sequota -b 1000 /testing delquota -b /testing setquota -n 1024 /testing Command failed: java.lang.IllegalArgumentException: /testing has a parent /zookeeper/quota/testing which has a quota |
0% | 0% | 86400 | 86400 | 71162 | No Perforce job exists for this issue. | 0 | 32727 | 8 years, 41 weeks, 1 day ago | 0|i05yqn: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1087 | ForceSync VM arguement not working when set to "no" |
Bug | Closed | Blocker | Fixed | Nate Putnam | Ankit Patel | Ankit Patel | 06/Jun/11 16:09 | 23/Nov/11 14:22 | 21/Jun/11 01:34 | 3.3.3 | 3.3.4, 3.4.0 | scripts | 0 | 2 | 300 | 300 | 0% | Cannot use forceSync=no to asynchronously write transaction logs. This is a critical bug, please address it ASAP. More details: The class org.apache.zookeeper.server.persistence.FileTxnLog initializes forceSync property in a static block. However, the static variable is defined after the static block with a default value of true. Therefore, the value of the variable can never be false. Please move the declaration of the variable before the static block. |
0% | 0% | 300 | 300 | 47485 | No Perforce job exists for this issue. | 3 | 32728 | 8 years, 40 weeks, 2 days ago | Respect the "zookeeper.forceSync" system property. |
Reviewed
|
0|i05yqv: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1086 | zookeeper test jar has non mavenised dependency. |
Bug | Closed | Major | Fixed | Ivan Kelly | Ivan Kelly | Ivan Kelly | 01/Jun/11 06:52 | 23/Nov/11 14:22 | 19/Oct/11 02:56 | 3.4.0 | 0 | 2 | BOOKKEEPER-20 | The zookeeper test jar, (zookeeper-<version>-test.jar) depends on accessive.jar which is not available in maven. This is problematic for projects using the test jar (i.e. hedwig). | 177 | No Perforce job exists for this issue. | 2 | 32729 | 8 years, 23 weeks, 1 day ago |
Reviewed
|
0|i05yr3: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1085 | CLONE - Deploy ZooKeeper jars/artifacts to a Maven Repository |
Task | Resolved | Critical | Not A Problem | Patrick D. Hunt | Michael Duergner | Michael Duergner | 01/Jun/11 03:22 | 01/Oct/13 20:10 | 01/Oct/13 20:10 | 3.0.0 | build | 0 | 0 | ZOOKEEPER-224 | Looks like 3.3.2 and 3.3.3 didn't get deployed on the Apache Maven Repository | 2430 | No Perforce job exists for this issue. | 0 | 42062 | 8 years, 43 weeks, 1 day ago | 0|i07kc7: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1084 | Hard-coding a well-known location for configuration directory gives less flexibility for packaging Zookeeper configurations |
Improvement | Resolved | Minor | Duplicate | Roman Shaposhnik | Roman Shaposhnik | Roman Shaposhnik | 31/May/11 18:55 | 21/Jun/11 13:33 | 21/Jun/11 13:33 | 3.3.2 | scripts | 0 | 0 | Currently, Zookeeper relies on zkEnv.sh logic to discover the location of the configuration directory if none is specified: {noformat} # We use ZOOCFGDIR if defined, # otherwise we use /etc/zookeeper # or the conf directory that is # a sibling of this script's directory if [ "x$ZOOCFGDIR" = "x" ] then if [ -d "/etc/zookeeper" ] then ZOOCFGDIR="/etc/zookeeper" else ZOOCFGDIR="$ZOOBINDIR/../conf" fi fi {noformat} The problem with such an approach is that having /etc/zookeeper (for whatever reason) trips this logic up in believing that it is THE place. It would be much nicer to follow the suit of other Apache Hadoop projects and restrict the logic to $ZOOCFGDIR and $ZOOBINDIR/../conf Please note, that if that happens one can always have an existing behavior of picking up /etc/zookeeper by creating a symlink at $ZOOBINDIR/../conf pointing to it. |
37452 | No Perforce job exists for this issue. | 1 | 30002 | 8 years, 40 weeks, 2 days ago | 0|i05hxj: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1083 | Javadoc for WatchedEvent not being generated |
Bug | Closed | Major | Fixed | Ivan Kelly | Ivan Kelly | Ivan Kelly | 31/May/11 12:23 | 23/Nov/11 14:22 | 13/Jun/11 13:25 | 3.4.0 | 0 | 1 | See title. | 47486 | No Perforce job exists for this issue. | 1 | 32730 | 8 years, 40 weeks, 6 days ago |
Reviewed
|
0|i05yrb: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1082 | ZOOKEEPER-335 modify leader election to correctly take into account current epoch |
Sub-task | Closed | Major | Fixed | Flavio Paiva Junqueira | Benjamin Reed | Benjamin Reed | 30/May/11 11:16 | 23/Nov/11 14:22 | 14/Jun/11 01:14 | 3.4.0 | server | 0 | 1 | when comparing zxids for leader election, the current epoch of the peer needs to be taken into account. | 47487 | No Perforce job exists for this issue. | 2 | 33336 | 8 years, 40 weeks, 6 days ago | Committed revision 1135382. | 0|i062hz: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1081 | ZOOKEEPER-335 modify leader/follower code to correctly deal with new leader |
Sub-task | Closed | Major | Fixed | Benjamin Reed | Benjamin Reed | Benjamin Reed | 30/May/11 11:15 | 23/Nov/11 14:22 | 14/Jun/11 01:14 | 3.4.0 | server | 0 | 1 | the leader and follower code need to be modified to correctly handle and log epoch changes | 47488 | No Perforce job exists for this issue. | 2 | 33337 | 8 years, 40 weeks, 6 days ago | Committed revision 1135382. | 0|i062i7: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1080 | Provide a Leader Election framework based on Zookeeper recipe |
New Feature | Resolved | Major | Duplicate | Hari A V | Hari A V | Hari A V | 30/May/11 09:24 | 17/May/14 08:03 | 17/May/14 08:03 | 3.3.2 | 3.5.0 | contrib | 6 | 21 | HDFS-1973, HIVE-2254, HDFS-2124, MAPREDUCE-2648, ZOOKEEPER-1095 | Currently Hadoop components such as NameNode and JobTracker are single point of failure. If Namenode or JobTracker goes down, there service will not be available until they are up and running again. If there was a Standby Namenode or JobTracker available and ready to serve when Active nodes go down, we could have reduced the service down time. Hadoop already provides a Standby Namenode implementation which is not fully a "hot" Standby. The common problem to be addressed in any such Active-Standby cluster is Leader Election and Failure detection. This can be done using Zookeeper as mentioned in the Zookeeper recipes. http://zookeeper.apache.org/doc/r3.3.3/recipes.html +Leader Election Service (LES)+ Any Node who wants to participate in Leader Election can use this service. They should start the service with required configurations. The service will notify the nodes whether they should be started as Active or Standby mode. Also they intimate any changes in the mode at runtime. All other complexities can be handled internally by the LES. |
2431 | No Perforce job exists for this issue. | 4 | 42063 | 5 years, 44 weeks, 5 days ago | 0|i07kcf: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1079 | 'Create' command in Hbase makes a table in Hbase but it sends 'Delete' request to Zookeeper !!! |
Test | Resolved | Major | Not A Problem | Unassigned | Mohamad Koohi-Moghadam | Mohamad Koohi-Moghadam | 29/May/11 02:41 | 08/Jun/11 12:24 | 08/Jun/11 12:24 | 3.3.3 | 0 | 0 | "when use 'Create' in Hbase==> Got user-level KeeperException... type:delete" And caused zookeper make Nonode Exception.... and when make a znode in zookeeper shell for example a node with name 'mkm' , and in hbase command line use create 'mkm' , 'm' this command delete 'mkm' from zookeeper !! Linux Ubuntu Zookeeper and Hbase |
71569 | No Perforce job exists for this issue. | 0 | 33338 | 8 years, 42 weeks, 1 day ago | 0|i062if: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1078 | add maven build support to ZooKeeper |
Improvement | Resolved | Major | Duplicate | Mohammad Arshad | Patrick D. Hunt | Patrick D. Hunt | 26/May/11 20:00 | 11/Feb/19 06:45 | 11/Feb/19 06:45 | build | 4 | 17 | ZOOKEEPER-2460, ZOOKEEPER-1334, ZOOKEEPER-96, ZOOKEEPER-103, ZOOKEEPER-2158, ZOOKEEPER-3021, ZOOKEEPER-899 | I've taken a stab at creating a maven build for ZooKeeper. (attachment to follow). |
2432 | No Perforce job exists for this issue. | 5 | 2565 | 1 year, 5 weeks, 3 days ago | 0|i00slj: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1077 | C client lib doesn't build on Solaris |
Bug | Closed | Critical | Fixed | Chris Nauroth | Tadeusz Andrzej Kadłubowski | Tadeusz Andrzej Kadłubowski | 26/May/11 04:35 | 21/Jul/16 16:18 | 18/May/15 03:39 | 3.3.4 | 3.4.7, 3.5.2, 3.6.0 | build, c client | 0 | 7 | ZOOKEEPER-1742 | uname -a: SunOS [redacted] 5.10 Generic_142910-17 i86pc i386 i86pc GNU toolchain (gcc 3.4.3, GNU Make etc.) |
Hello, Some minor trouble with building ZooKeeper C client library on Sun^H^H^HOracle Solaris 5.10. 1. You need to link against "-lnsl -lsocket" 2. ctime_r needs a buffer size. The signature is: "char *ctime_r(const time_t *clock, char *buf, int buflen)" 3. In zk_log.c you need to manually cast pid_t to int (-Werror can be cumbersome ;) ) 4. getpwuid_r()returns pointer to struct passwd, which works as the last parameter on Linux. Solaris signature: struct passwd *getpwuid_r(uid_t uid, struct passwd *pwd, char *buffer, int buflen); Linux signature: int getpwuid_r(uid_t uid, struct passwd *pwd, char *buf, size_t buflen, struct passwd **result); |
2433 | No Perforce job exists for this issue. | 4 | 32731 | 4 years, 44 weeks, 3 days ago | Support for building C client lib on Illumos (and presumably OpenSolaris). Configure with "CPPFLAGS=-D_POSIX_PTHREAD_SEMANTICS LDFLAGS="-lnsl -lsocket" ./configure" | 0|i05yrj: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1076 | some quorum tests are unnecessarily extending QuorumBase |
Bug | Closed | Minor | Fixed | Patrick D. Hunt | Patrick D. Hunt | Patrick D. Hunt | 25/May/11 18:19 | 23/Nov/11 14:22 | 29/Jul/11 04:14 | 3.4.0 | 3.4.0 | tests | 0 | 1 | Some tests are unnecessarily extending QuorumBase. Typically this is not a big issue, but it may cause more servers than necessary to be started (harder to debug a failing test in particular). |
47489 | No Perforce job exists for this issue. | 2 | 32732 | 8 years, 33 weeks, 1 day ago |
Reviewed
|
0|i05yrr: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1075 | Zookeeper Server cannot join an existing ensemble if the existing ensemble doesn't already have a quorum |
Bug | Resolved | Major | Not A Problem | Unassigned | Vishal Kathuria | Vishal Kathuria | 25/May/11 17:39 | 27/May/11 03:07 | 26/May/11 14:19 | 3.3.2 | leaderElection | 0 | 4 | 172800 | 172800 | 0% | Windows 7 | Here is the sequence of steps that reproduces the problem. On a 3 server ensemble, 1. Bring up two servers (say 1 and 2). Lets say 1 is leading. 2. Bring down 2 3. Bring up 2. 4. 2 gets a notification from 1 that it is leading but 2 doesn't accept it as a leader since it cannot find one other node that thinks 1 is the leader. So the ensemble gets stuck where 2 isn't following. If at this point, 3 comes up, then one of 2 & 3 will become a leader and 1 will keep thinking it is the leader. I am working on a patch to fix this issue. |
0% | 0% | 172800 | 172800 | 214218 | No Perforce job exists for this issue. | 1 | 32733 | 8 years, 43 weeks, 6 days ago | 0|i05yrz: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1074 | zkServer.sh is missing nohup/sleep, which are necessary for remote invocation |
Bug | Closed | Major | Fixed | Patrick D. Hunt | Patrick D. Hunt | Patrick D. Hunt | 25/May/11 14:27 | 23/Nov/11 14:22 | 27/Jun/11 01:04 | 3.3.3, 3.4.0 | 3.4.0 | scripts | 0 | 1 | zkServer.sh is missing nohup and "sleep 1" when starting the background daemon. This is fine normally, however when running the server remotely via ssh this causes the process to not run successfully (it starts but immediately exits). I'll be submitting a patch for this shortly. |
47490 | No Perforce job exists for this issue. | 1 | 32734 | 8 years, 39 weeks, 3 days ago |
Reviewed
|
0|i05ys7: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1073 | address a documentation issue in ZOOKEEPER-1030 |
Bug | Closed | Minor | Fixed | Patrick D. Hunt | Patrick D. Hunt | Patrick D. Hunt | 25/May/11 13:52 | 23/Nov/11 14:22 | 07/Jul/11 03:32 | 3.4.0 | 3.4.0 | documentation | 0 | 1 | ZOOKEEPER-1030 | ZOOKEEPER-1030 updated the generated docs, not the source docs. I'll submit a patch to address in the src. | 47491 | No Perforce job exists for this issue. | 1 | 32735 | 8 years, 38 weeks ago |
Reviewed
|
0|i05ysf: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1072 | Support for embedded ZooKeeper |
Task | Open | Major | Unresolved | Unassigned | Vishal Kher | Vishal Kher | 24/May/11 18:08 | 05/Feb/20 07:17 | 3.3.0 | 3.7.0, 3.5.8 | server | 3 | 5 | ZOOKEEPER-575 | We have seen several cases where users have embedded zookeeper in their application instead of running ZooKeeper in an independent JVM. Different applications use different ways of starting and stopping QuorumPeer. Instead, we should provide a standard and simple API for starting/stopping zookeeper (and also document it). |
embedd, server | 2434 | No Perforce job exists for this issue. | 0 | 42064 | 8 years, 44 weeks, 2 days ago | 0|i07kcn: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1071 | zkServer.sh script needs to track whether ZK is already running or not |
Bug | Resolved | Major | Duplicate | Unassigned | Roman Shaposhnik | Roman Shaposhnik | 24/May/11 13:20 | 26/May/11 13:48 | 26/May/11 13:48 | scripts | 0 | 0 | If one repeatedly invokes: {noformat} /usr/lib/zookeeper/bin/zkServer.sh start {noformat} after the initial start 2 bad things happen: 1. ZK reports that it got started where in reality it failed with the following: {noformat} 2011-05-24 10:18:58,217 - INFO [main:NIOServerCnxn$Factory@143] - binding to port 0.0.0.0/0.0.0.0:2181 2011-05-24 10:18:58,219 - FATAL [main:ZooKeeperServerMain@62] - Unexpected exception, exiting abnormally java.net.BindException: Address already in use {noformat} 2. It clobbers the zookeeper_server.pid file |
214217 | No Perforce job exists for this issue. | 0 | 32736 | 8 years, 44 weeks ago | 0|i05ysn: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1070 | let org.apache.zookeeper.recipes.lock.WriteLock implements java.util.concurrent.locks.Lock |
Improvement | Open | Major | Unresolved | Unassigned | Yanming Zhou | Yanming Zhou | 23/May/11 22:23 | 23/May/11 22:23 | recipes | 2 | 4 | and add a zookeeper distributed java.util.concurrent.locks.ReadWriteLock use concurrent locks internally,don't use keyword synchronized |
2435 | No Perforce job exists for this issue. | 0 | 42065 | 8 years, 44 weeks, 2 days ago | 0|i07kcv: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1069 | Calling shutdown() on a QuorumPeer too quickly can lead to a corrupt log |
Bug | Closed | Critical | Fixed | Vishal Kher | Jeremy Stribling | Jeremy Stribling | 23/May/11 19:53 | 23/Nov/11 14:22 | 17/Jul/11 10:36 | 3.3.3 | 3.3.4, 3.4.0 | quorum, server | 0 | 1 | ZOOKEEPER-1060 | Linux, ZK 3.3.3, 3-node cluster. | I've only seen this happen once. In order to restart Zookeeper with a new set of servers, we have a wrapper class that calls shutdown() on an existing QuorumPeer, and then starts a new one with a new set of servers. Specifically, our shutdown code looks like this: {code} synchronized(_quorum_peer) { _quorum_peer.shutdown(); FastLeaderElection fle = (FastLeaderElection) _quorum_peer.getElectionAlg(); fle.shutdown(); // I think this is unnecessary try { _quorum_peer.getTxnFactory().commit(); } catch (java.nio.channels.ClosedChannelException e) { // ignore } } {code} One time, our wrapper class started one QuorumPeer, and then had to shut it down and start a new one very soon after the QuorumPeer transitioned into a FOLLOWING state. When the new QuorumPeer tried to read in the latest log from disk, it encountered a bogus magic number of all zeroes: {noformat} 2011-05-18 22:42:29,823 10467 [pool-1-thread-2] FATAL org.apache.zookeeper.server.quorum.QuorumPeer - Unable to load database on disk java.io.IOException: Transaction log: /var/cloudnet/data/zookeeper/version-2/log.700000001 has invalid magic number 0 != 1514884167 at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:510) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:527) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:493) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:576) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:479) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.<init>(FileTxnLog.java:454) at org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:325) at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:126) at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:222) at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:398) ... 2011-05-18 22:42:29,823 10467 [pool-1-thread-2] ERROR com.nicira.onix.zookeeper.Zookeeper - Unexpected exception java.lang.RuntimeException: Unable to run quorum server at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:401) at com.nicira.onix.zookeeper.Zookeeper.StartZookeeper(Zookeeper.java:198) at com.nicira.onix.zookeeper.Zookeeper.RestartZookeeper(Zookeeper.java:277) at com.nicira.onix.zookeeper.ZKRPCService.setServers(ZKRPC.java:83) at com.nicira.onix.zookeeper.Zkrpc$ZKRPCService.callMethod(Zkrpc.java:8198) at com.nicira.onix.rpc.RPC$10.run(RPC.java:534) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.io.IOException: Transaction log: /var/cloudnet/data/zookeeper/version-2/log.700000001 has invalid magic number 0 != 1514884167 at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:510) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:527) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:493) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:576) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:479) at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.<init>(FileTxnLog.java:454) at org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:325) at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:126) at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:222) at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:398) ... 8 more {noformat} I looked into the code a bit, and I believe the problem comes from the fact that QuorumPeer.shutdown() does not join() on this before returning. Here's the scenario I think can happen: # QuorumPeer.run() notices it is in the FOLLOWING state, makes a new Follower, and calls Follower.followLeader(), which starts connecting to the leader. # In the main program thread, QuorumPeer.shutdown() is called. # Through a complicated series of calls, this eventually leads to FollowerZooKeeperServer.shutdown() being called. # This method calls SyncRequestProcess.shutdown(), which joins on this and returns. However, it's possible that the SyncRequestProcessor thread hasn't yet been started because followLeader() hasn't yet called Learner.syncWithLeader(), which hasn't yet called ZooKeeperServer.startup(), which actually starts the thread. Thus, the join would have no request, though a requestOfDeath is added to the queued requests list (possibly behind other requests). # Back in the main thread, FileTxnSnapLog.commit() is called, which doesn't do much because the processor hasn't processed anything yet. # Finally, ZooKeeperServer.startup is called in the QuorumPeer.run() thread, starting up the SyncRequestProcessor thread. # That thread appends some request to the log. The log doesn't exist yet, so it creates a new one, padding it with zeroes. # Now either the SyncRequestProcessor hits the requestOfDeath or the whole QuorumPeer object is deleted. It exits that thread without ever committing the log to disk (or the new QuorumPeer tries to read the log before the old thread gets to commit anything), and the log ends up with all zeroes instead of a proper magic number. I haven't yet looked into whether there's an easy way to join() on the QuorumPeer thread from shutdown(), so that it won't go on to start the processor threads after it's been shutdown. I wanted to check with the group first and see if anyone else agrees this could be a problem. I marked this as minor since I think almost no one else uses Zookeeper this way, but it's pretty important to me personally. I will upload a log file showing this behavior shortly. |
persistence, shutdown | 47492 | No Perforce job exists for this issue. | 4 | 32737 | 8 years, 36 weeks, 4 days ago | 0|i05ysv: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1068 | Documentation and default config suggest incorrect location for Zookeeper state |
Bug | Closed | Minor | Fixed | Roman Shaposhnik | Roman Shaposhnik | Roman Shaposhnik | 23/May/11 19:08 | 23/Nov/11 14:22 | 21/Jun/11 13:24 | 3.4.0 | documentation, scripts | 0 | 1 | Documentation and default config suggest /var/zookeeper as a value for dataDir. This practice is, strictly speaking, incompatible with UNIX/Linux filesystem layout standards (e.g. http://www.s-gms.ms.edus.si/cgi-bin/man-cgi?filesystem+5 , http://tldp.org/LDP/Linux-Filesystem-Hierarchy/html/index.html ). Even though Zookeeper use is not limited to UNIX-like OSes I'd recommend that we change references to /var/zookeeper to /var/lib/zookeeper |
47493 | No Perforce job exists for this issue. | 1 | 32738 | 8 years, 40 weeks, 1 day ago |
Reviewed
|
0|i05yt3: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1067 | the doxygen doc should be generated as part of the release |
Improvement | Open | Major | Unresolved | Unassigned | Benjamin Reed | Benjamin Reed | 23/May/11 12:54 | 05/Feb/20 07:16 | 3.7.0, 3.5.8 | 0 | 1 | currently our releases generate the javadoc as part of the documentation. we should also generate the doxygen for the c api. | 2436 | No Perforce job exists for this issue. | 0 | 42066 | 8 years, 36 weeks, 5 days ago | 0|i07kd3: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1066 | Multi should have an async version |
Bug | Open | Major | Unresolved | Unassigned | Ted Dunning | Ted Dunning | 21/May/11 17:09 | 10/Oct/13 13:24 | c client | 1 | 2 | HBASE-7022, ZOOKEEPER-1572 | per the code review on ZOOKEEPER-965 it seems that multi should have an asynchronous version. The semantics should be essentially identical. The only difference is that the original caller shouldn't wait for the result. Cloning existing multi-operations should be a decent implementation strategy. |
2437 | No Perforce job exists for this issue. | 0 | 32739 | 6 years, 24 weeks ago | 0|i05ytb: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1065 | Possible timing issue in embedded server |
Bug | Resolved | Major | Invalid | Unassigned | Gunnar Wagenknecht | Gunnar Wagenknecht | 20/May/11 02:46 | 20/May/11 14:39 | 20/May/11 14:30 | 3.3.3 | java client, server | 0 | 0 | Windows 7, 32bit, Core2 Duo T9300, JDK 1.6.0_24, ZooKeeper data on 500GB hybrid Seagate HDD with 4GB SSD cache | I have an application that uses ZooKeeper. There is an ensemble in production. But in order to simplify development the application will start an embedded ZooKeeper server when started in development mode. We are experiencing a timing issue with ZooKeeper 3.3.3 and I was wondering if this is allowed to be happen or if we did something wrong when starting the embedded server. Basically, we have a watch registered using an #exists call and watch code like the following. {code} @Override public void process(final WatchedEvent event) { switch (event.getType()) { ... case NodeCreated: pathCreated(event.getPath()); break; ... } } @Override protected void pathCreated(final String path) { // process events only for this node if (!isMyPath(path)) return; try { loadNode(); // calls zk.getData(String, Watcher, Stat) } catch (final Exception e) { // got NoNodeException here (but not when debugging) log(..., e) } } {code} From inspecting the logs we noticed a NoNodeException. When setting breakpoints on #loadNode and stepping through we don't get the exception. But when setting a breakpoint on #log only we got a hit and could confirm the issue this way. The path is actually some levels deep. All the parent paths don't exist either so they are created as well. However, no exception is thrown fro them. The sequence is as follows. {noformat} /l1 --> watch triggered, getData, no exception /l1/l2 --> watch triggered, getData, no exception /l1/l2/l3 --> watch triggered, getData, no exception /l1/l2/l3/l4 --> watch triggered, getData, no exception /l1/l2/l3/l4/l5 --> watch triggered, getData, no exception /l1/l2/l3/l4/l5/l6 --> watch triggered, getData, NoNodeException {noformat} The only difference is that all paths up to including l5 do not actually have any data. Only l6 has some data. Could there be some latency issues? For completeness, the embedded server is started as follows. {code} // disable LOG4J JMX stuff System.setProperty("zookeeper.jmx.log4j.disable", Boolean.TRUE.toString()); // get directories final File dataDir = new File(config.getDataLogDir()); final File snapDir = new File(config.getDataDir()); // clean old logs PurgeTxnLog.purge(dataDir, snapDir, 3); // create standalone server zkServer = new ZooKeeperServer(); zkServer.setTxnLogFactory(new FileTxnSnapLog(dataDir, snapDir)); zkServer.setTickTime(config.getTickTime()); zkServer.setMinSessionTimeout(config.getMinSessionTimeout()); zkServer.setMaxSessionTimeout(config.getMaxSessionTimeout()); factory = new NIOServerCnxn.Factory(config.getClientPortAddress(), config.getMaxClientCnxns()); // start server LOG.info("Starting ZooKeeper standalone server."); try { factory.startup(zkServer); } catch (final InterruptedException e) { LOG.warn("Interrupted during server start.", e); Thread.currentThread().interrupt(); } {code} |
214216 | No Perforce job exists for this issue. | 1 | 32740 | 8 years, 44 weeks, 6 days ago | 0|i05ytj: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1064 | Startup script needs more LSB compatability |
Bug | Resolved | Major | Implemented | Unassigned | Ted Dunning | Ted Dunning | 18/May/11 20:36 | 10/Oct/13 13:25 | 10/Oct/13 13:25 | 3.3.2 | 0 | 2 | ZOOKEEPER-999 | The zkServer.sh script kind of sort of implements the standard init.d style of interaction. It lacks - nice return codes - status method - standard output messages See http://refspecs.freestandards.org/LSB_3.1.0/LSB-Core-generic/LSB-Core-generic/iniscrptact.html and http://refspecs.freestandards.org/LSB_3.1.0/LSB-Core-generic/LSB-Core-generic/iniscrptfunc.html and http://wiki.debian.org/LSBInitScripts It is an open question how much zkServer should use these LSB scripts because that may impair portability. I think it should produce similar messages, however, and should return standardized error codes. If lsb functions are available, I think that they should be used so that ZK works as a first class citizen. I will produce a proposed patch. |
2439 | No Perforce job exists for this issue. | 0 | 32741 | 6 years, 24 weeks ago | 0|i05ytr: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1063 | Dubious synchronization in Zookeeper and ClientCnxnSocketNIO classes |
Bug | Closed | Critical | Fixed | Yanick Dufresne | Yanick Dufresne | Yanick Dufresne | 17/May/11 17:34 | 23/Nov/11 14:22 | 15/Jul/11 00:11 | 3.4.0 | java client | 0 | 2 | Synchronization around dataWatches, existWatches and childWatches in Zookeeper is incorrect. Synchronization around outgoingQueue and pendingQueue in ClientCnxnSocketNIO is incorrect. Synchronization around selector and key sets in ClientCnxnSocketNIO seems odd. |
47494 | No Perforce job exists for this issue. | 3 | 32742 | 8 years, 36 weeks, 6 days ago | 0|i05ytz: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1062 | Net-ZooKeeper: Net::ZooKeeper consumes 100% cpu on wait |
Bug | Resolved | Major | Fixed | Botond Hejj | Patrick D. Hunt | Patrick D. Hunt | 13/May/11 01:50 | 20/May/14 07:09 | 16/May/14 18:33 | 3.3.1, 3.4.5, 3.4.6 | 3.4.7, 3.5.0 | contrib-bindings | 0 | 5 | Reported by a user on the CDH user list (user reports that the listed fix addressed this issue for him): "Net::ZooKeeper consumes 100% cpu when "wait" is used. At my initial inspection, it seems to be related to implementation mistake in pthread_cond_timedwait." https://rt.cpan.org/Public/Bug/Display.html?id=61290 |
patch | 2440 | No Perforce job exists for this issue. | 2 | 32743 | 5 years, 44 weeks, 2 days ago | Cosmetic fixes to the patch | 0|i05yu7: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1061 | Zookeeper stop fails if start called twice |
Bug | Closed | Major | Fixed | Ted Dunning | Ted Dunning | Ted Dunning | 10/May/11 16:38 | 30/Mar/17 10:27 | 16/May/11 13:12 | 3.3.2 | 3.4.0 | scripts | 0 | 4 | The zkServer.sh script doesn't check properly to see if a previously started server is still running. If you call start twice, the second invocation will over-write the PID file with a process that then fails due to port occupancy. This means that stop will subsequently fail. Here is a reference that describes how init scripts should normally work: http://refspecs.freestandards.org/LSB_3.1.0/LSB-Core-generic/LSB-Core-generic/iniscrptact.html |
37453 | No Perforce job exists for this issue. | 1 | 32744 | 2 years, 51 weeks ago |
Reviewed
|
0|i05yuf: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1060 | QuorumPeer takes a long time to shutdown |
Bug | Closed | Minor | Fixed | Vishal Kher | Vishal Kher | Vishal Kher | 10/May/11 15:32 | 23/Nov/11 14:22 | 14/Jun/11 08:14 | 3.4.0 | 3.4.0 | quorum | 0 | 2 | ZOOKEEPER-1069 | This problem is seen only if you have ZooKeeper embedded in your application. QuorumPeerMain.initializeAndRun() does a quorumPeer.join() before exiting. QuorumPeer.shutdown() tries to cleanup everything, but it does not interrupt itself. As a result, a if the peer is running FLE, it might be waiting to receive notifications (recvqueue.poll()) in FastLeaderElection. Therefore, quorumPeer.join() will wait until the peer wakes up from poll(). The fix is simple - call this.interrupt() in QuorumPeer.shutdown(). |
47495 | No Perforce job exists for this issue. | 1 | 32745 | 8 years, 41 weeks, 2 days ago | 0|i05yun: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1059 | stat command isses on non-existing node causes NPE |
Bug | Closed | Major | Fixed | Bhallamudi Venkata Siva Kamesh | Bhallamudi Venkata Siva Kamesh | Bhallamudi Venkata Siva Kamesh | 04/May/11 06:50 | 23/Nov/11 14:22 | 16/May/11 13:39 | 3.4.0 | java client | 0 | 1 | *stat* command issues on non existing zookeeper node,causes NPE to the client. {noformat} [zk: localhost:2181(CONNECTED) 2] stat /invalidPath Exception in thread "main" java.lang.NullPointerException at org.apache.zookeeper.ZooKeeperMain.printStat(ZooKeeperMain.java:131) at org.apache.zookeeper.ZooKeeperMain.processZKCmd(ZooKeeperMain.java:723) at org.apache.zookeeper.ZooKeeperMain.processCmd(ZooKeeperMain.java:582) at org.apache.zookeeper.ZooKeeperMain.executeLine(ZooKeeperMain.java:354) at org.apache.zookeeper.ZooKeeperMain.run(ZooKeeperMain.java:312) at org.apache.zookeeper.ZooKeeperMain.main(ZooKeeperMain.java:271) {noformat} |
47496 | No Perforce job exists for this issue. | 1 | 32746 | 8 years, 45 weeks, 2 days ago |
Reviewed
|
0|i05yuv: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1058 | fix typo in opToString for getData |
Bug | Closed | Trivial | Fixed | Camille Fournier | Camille Fournier | Camille Fournier | 03/May/11 20:34 | 23/Nov/11 14:22 | 20/May/11 17:42 | 3.4.0 | 0 | 1 | fix Request getData to print that instead of getDate | 47497 | No Perforce job exists for this issue. | 1 | 32747 | 8 years, 44 weeks, 5 days ago | Committed revision 1125544 |
Reviewed
|
0|i05yv3: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1057 | zookeeper c-client, connection to offline server fails to successfully fallback to second zk host |
Bug | Closed | Blocker | Fixed | Michi Mutsuzaki | Woody Anderson | Woody Anderson | 02/May/11 21:16 | 13/Mar/14 14:17 | 09/Jan/14 16:04 | 3.3.1, 3.3.2, 3.3.3 | 3.4.6, 3.5.0 | c client | 2 | 8 | snowdutyrise-lm ~/-> uname -a Darwin snowdutyrise-lm 9.8.0 Darwin Kernel Version 9.8.0: Wed Jul 15 16:55:01 PDT 2009; root:xnu-1228.15.4~1/RELEASE_I386 i386 also observed on: 2.6.35-28-server 49-Ubuntu SMP Tue Mar 1 14:55:37 UTC 2011 |
Hello, I'm a contributor for the node.js zookeeper module: https://github.com/yfinkelstein/node-zookeeper i'm using zk 3.3.3 for the purposes of this issue, but i have validated it fails on 3.3.1 and 3.3.2 i'm having an issue when trying to connect when one of my zookeeper servers is offline. if the first server attempted is online, all is good. if the offline server is attempted first, then the client is never able to connect to _any_ server. inside zookeeper.c a connection loss (-4) is received, the socket is closed and buffers are cleaned up, it then attempts the next server in the list, creates a new socket (which gets the same fd as the previously closed socket) and connecting fails, and it continues to fail seemingly forever. The nature of this "fail" is not that it gets -4 connection loss errors, but that zookeeper_interest doesn't find anything going on on the socket before the user provided timeout kicks things out. I don't want to have to wait 5 minutes, even if i could make myself. this is the message that follows the connection loss: 2011-04-27 23:18:28,355:13485:ZOO_ERROR@handle_socket_error_msg@1530: Socket [127.0.0.1:5020] zk retcode=-7, errno=60(Operation timed out): connection timed out (exceeded timeout by 3ms) 2011-04-27 23:18:28,355:13485:ZOO_ERROR@yield@213: yield:zookeeper_interest returned error: -7 - operation timeout While investigating, i decided to comment out close(zh->fd) in handle_error (zookeeper.c#1153) now everything works (obviously i'm leaking an fd). Connection the the second host works immediately. this is the behavior i'm looking for, though i clearly don't want to leak the fd, so i'm wondering why the fd re-use is causing this issue. close() is not returning an error (i checked even though current code assumes success). i'm on osx 10.6.7 i tried adding a setsockopt so_linger (though i didn't want that to be a solution), it didn't work. full debug traces are included in issue here: https://github.com/yfinkelstein/node-zookeeper/issues/6 |
2441 | No Perforce job exists for this issue. | 7 | 32748 | 6 years, 2 weeks ago | 0|i05yvb: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1056 | Questions and Improvements for the C client codebase |
Bug | Open | Minor | Unresolved | Unassigned | Stephen Tyree | Stephen Tyree | 26/Apr/11 23:20 | 26/Apr/11 23:20 | 3.4.0 | c client | 0 | 0 | Having been using the C client for a few months now, I thought I'd look through the code and see if anything could be improved and/or fixed in order to be a good citizen. Here are some observations and questions I was hoping people could elaborate on. - There appears to be a bug in sub_string (zookeeper.c). The third argument being passed into strncmp is a conditional due to misplaced parenthesis, meaning the length is either 0 or 1. This likely leads to many, many false positives of chroots matching paths. - There appears to be a bug in queue_session_event, where we check for cptr->buffer not being NULL after already dereferencing it - In both queue_buffer and queue_completion_nolock, we assert a conditional that we just checked for - What is the policy on whether the result of memory allocations are checked for, assert'd against or ignored? This is done inconsistently. - What is the policy on whether pointers are checked/set against NULL versus 0? This is done inconsistently. - Some functions, such as zoo_wget_children2_, exhibit needlessly high cyclomatic complexity - What is the policy on line length restrictions? Some functions go through hurdles to enforce 80 characters while others do no such thing. - What is the policy on indentation and spacing of if statements and blocks of code? This is done inconsistently. If any or all of these turn out to be issues that need to be fixed I'd be more than happy to do so. |
2442 | No Perforce job exists for this issue. | 0 | 32749 | 8 years, 48 weeks, 1 day ago | 0|i05yvj: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1055 | check for duplicate ACLs in addACL() and create() |
Bug | Closed | Major | Fixed | Eugene Joseph Koontz | Eugene Joseph Koontz | Eugene Joseph Koontz | 26/Apr/11 17:08 | 23/Nov/11 14:22 | 14/Aug/11 20:35 | 3.4.0 | 3.4.0 | 0 | 1 | ZOOKEEPER-1173, ZOOKEEPER-1125 | actual result: [zk: (CONNECTED) 0] create /test2 'test2' digest:test:test:cdrwa,digest:test:test:cdrwa Created /test2 [zk: (CONNECTED) 1] getAcl /test2 'digest,'test:test : cdrwa 'digest,'test:test : cdrwa [zk: (CONNECTED) 2] but getAcl should only have a single entry. |
47498 | No Perforce job exists for this issue. | 6 | 32750 | 8 years, 32 weeks, 3 days ago | refresh against trunk. |
Reviewed
|
0|i05yvr: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1054 | Drop connections from servers not in the cluster configuration |
Improvement | Open | Minor | Unresolved | Bhallamudi Venkata Siva Kamesh | Bhallamudi Venkata Siva Kamesh | Bhallamudi Venkata Siva Kamesh | 26/Apr/11 04:29 | 05/Feb/20 07:16 | 3.7.0, 3.5.8 | leaderElection | 0 | 2 | Let us suppose zookeeper cluster is running in the following machines {noformat} server.1=10.18.52.133:2999:3999 server.2=10.18.52.253:2999:3999 server.3=10.18.52.96:2999:3999 {noformat} Let us take another zookeeper(10.18.52.109),which is not part of the cluster configuration, tries to participate in the leader election,then one of the zookeeper server's log is filled with following INFO messages {noformat} 2011-04-19 17:42:42,457 - INFO [/10.18.52.133:3999:QuorumCnxManager$Listener@486] - Received connection request /10.18.52.109:18324 {noformat} |
security | 34 | No Perforce job exists for this issue. | 5 | 711 | 6 years, 2 weeks, 1 day ago | 0|i00h5b: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1053 | PurgeTxnLog only take relative path |
Improvement | Resolved | Major | Invalid | Unassigned | Jun Rao | Jun Rao | 25/Apr/11 20:53 | 23/May/14 07:30 | 23/May/14 07:30 | 3.3.3 | server | 0 | 2 | PurgeTxnLog only works on relative path for the data and the snapshot directory. It should support absolute paths too. | 2443 | No Perforce job exists for this issue. | 0 | 42067 | 5 years, 43 weeks, 6 days ago | 0|i07kdb: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1052 | Findbugs warning in QuorumPeer.ResponderThread.run() |
Bug | Closed | Major | Fixed | Flavio Paiva Junqueira | Flavio Paiva Junqueira | Flavio Paiva Junqueira | 24/Apr/11 09:45 | 23/Nov/11 14:22 | 03/May/11 13:58 | 3.3.2 | 3.4.0 | 1 | 1 | {noformat} REC Exception is caught when Exception is not thrown in org.apache.zookeeper.server.quorum.QuorumPeer$ResponderThread.run() {noformat} |
47499 | No Perforce job exists for this issue. | 1 | 32751 | 8 years, 47 weeks, 1 day ago | 0|i05yvz: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1051 | SIGPIPE in Zookeeper 0.3.* when send'ing after cluster disconnection |
Bug | Closed | Minor | Fixed | Stephen Tyree | Stephen Tyree | Stephen Tyree | 21/Apr/11 09:56 | 23/Nov/11 14:21 | 30/Aug/11 03:02 | 3.3.2, 3.3.3, 3.4.0 | 3.4.0 | c client | 1 | 3 | 7200 | 7200 | 0% | In libzookeeper_mt, if your process is going rather slowly (such as when running it in Valgrind's Memcheck) or you are using gdb with breakpoints, you can occasionally get SIGPIPE when trying to send a message to the cluster. For example: ==12788== ==12788== Process terminating with default action of signal 13 (SIGPIPE) ==12788== at 0x3F5180DE91: send (in /lib64/libpthread-2.5.so) ==12788== by 0x7F060AA: ??? (in /usr/lib64/libzookeeper_mt.so.2.0.0) ==12788== by 0x7F06E5B: zookeeper_process (in /usr/lib64/libzookeeper_mt.so.2.0.0) ==12788== by 0x7F0D38E: ??? (in /usr/lib64/libzookeeper_mt.so.2.0.0) ==12788== by 0x3F5180673C: start_thread (in /lib64/libpthread-2.5.so) ==12788== by 0x3F50CD3F6C: clone (in /lib64/libc-2.5.so) ==12788== This is probably not the behavior we would like, since we handle server disconnections after a failed call to send. To fix this, there are a few options we could use. For BSD environments, we can tell a socket to never send SIGPIPE with send using setsockopt: setsockopt(sd, SOL_SOCKET, SO_NOSIGPIPE, (void *)&set, sizeof(int)); For Linux environments, we can add a MSG_NOSIGNAL flag to every send call that says to not send SIGPIPE on a bad file descriptor. For more information, see: http://stackoverflow.com/questions/108183/how-to-prevent-sigpipes-or-handle-them-properly |
0% | 0% | 7200 | 7200 | 47500 | No Perforce job exists for this issue. | 2 | 32752 | 8 years, 30 weeks, 2 days ago | Add flag to socket send on Linux that prevents SIGPIPE from being fired should the Zookeeper cluster close the connection on its side. |
Reviewed
|
0|i05yw7: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1050 | zooinspector shell scripts do not work |
Bug | Resolved | Trivial | Fixed | Will Johnson | Chris Burroughs | Chris Burroughs | 20/Apr/11 20:46 | 06/Jan/12 05:57 | 05/Jan/12 20:23 | 3.3.2 | 3.5.0 | contrib | 0 | 3 | * zooInspector-dev.sh uses DOS line endings. Dash at least chokes on this. * zooInspector.sh has an errant ; in the classpath. Also there really isn't a reason to hard code the zookeeper version needed in lib. Just use a glob. |
zooinspector | 2444 | No Perforce job exists for this issue. | 2 | 32753 | 8 years, 11 weeks, 6 days ago |
Reviewed
|
0|i05ywf: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1049 | Session expire/close flooding renders heartbeats to delay significantly |
Bug | Closed | Critical | Fixed | Chang Song | Chang Song | Chang Song | 15/Apr/11 23:42 | 23/Nov/11 14:22 | 03/May/11 17:30 | 3.3.2 | 3.3.4, 3.4.0 | server | 0 | 6 | ZOOKEEPER-1238 | CentOS 5.3, three node ZK ensemble | Let's say we have 100 clients (group A) already connected to three-node ZK ensemble with session timeout of 15 second. And we have 1000 clients (group B) already connected to the same ZK ensemble, all watching several nodes (with 15 second session timeout) Consider a case in which All clients in group B suddenly hung or deadlocked (JVM OOME) all at the same time. 15 seconds later, all sessions in group B gets expired, creating session closing stampede. Depending on the number of this clients in group B, all request/response ZK ensemble should process get delayed up to 8 seconds (1000 clients we have tested). This delay causes some clients in group A their sessions expired due to delay in getting heartbeat response. This causes normal servers to drop out of clusters. This is a serious problem in our installation, since some of our services running batch servers or CI servers creating the same scenario as above almost everyday. I am attaching a graph showing ping response time delay. I think ordering of creating/closing sessions and ping exchange isn't important (quorum state machine). at least ping request / response should be handle independently (different queue and different thread) to keep realtime-ness of ping. As a workaround, we are raising session timeout to 50 seconds. But this causes max. failover of cluster to significantly increased, thus initial QoS we promised cannot be met. |
47501 | No Perforce job exists for this issue. | 3 | 32754 | 8 years, 24 weeks, 1 day ago | 0|i05ywn: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1048 | addauth command does not work in cli_mt/cli_st |
Bug | Resolved | Major | Fixed | allengao | allengao | allengao | 13/Apr/11 05:40 | 02/Mar/16 20:36 | 05/May/12 23:47 | 3.3.1 | 3.3.6, 3.4.4, 3.5.0 | c client | 0 | 3 | 604800 | 604800 | 0% | SUSE_64 | I can not operation a node with ACL by "addauth" when using cli_st. I have fixed this bug: original:else if (startsWith(line, "addauth ")) { char *ptr; line += 8; ptr = strchr(line, ' '); if (ptr) { *ptr = '\0'; ptr++; } zoo_add_auth(zh, line, ptr, ptr ? strlen(ptr) -1 : 0, NULL, NULL); now: zoo_add_auth(zh, line, ptr, ptr ? strlen(ptr) : 0, NULL, NULL); strlen(ptr) is just ok. |
0% | 0% | 604800 | 604800 | patch | 2445 | No Perforce job exists for this issue. | 0 | 32755 | 7 years, 46 weeks, 4 days ago | addauth | 0|i05ywv: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1047 | ZooKeeper Standalone does not shutdown cleanly |
Bug | Open | Major | Unresolved | Unassigned | Gunnar Wagenknecht | Gunnar Wagenknecht | 13/Apr/11 05:04 | 13/Apr/11 05:04 | 3.3.3 | server | 0 | 1 | When I shutdown a standalone ZooKeeper server (programmatically) I get the following exception logged. Occasionally, no exception is logged. {noformat} 10:32:43.353 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] WARN o.a.zookeeper.server.NIOServerCnxn - Ignoring unexpected runtime exception java.nio.channels.CancelledKeyException: null at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:55) ~[na:1.6.0_24] at sun.nio.ch.SelectionKeyImpl.readyOps(SelectionKeyImpl.java:69) ~[na:1.6.0_24] at org.apache.zookeeper.server.NIOServerCnxn$Factory.run(NIOServerCnxn.java:241) ~[na:na] 10:32:43.353 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] INFO o.a.zookeeper.server.NIOServerCnxn - NIOServerCnxn factory exited run method 10:32:43.387 [SyncThread:0] INFO o.a.z.server.SyncRequestProcessor - SyncRequestProcessor exited! 10:32:43.387 [ProcessThread:-1] INFO o.a.z.server.PrepRequestProcessor - PrepRequestProcessor exited loop! 10:32:43.387 [app thread] INFO o.a.z.server.FinalRequestProcessor - shutdown of request processor complete {noformat} Because it's logged with a WARN level, my assumption is that something is wrong on shutdown. However, I follow the exact same shutdown order than ZooKeeperMain, i.e. shutdown the {{NIOServerCnxn.Factory}} first and shutdown the {{ZooKeeperServer}} instance thereafter if its still running. {noformat} ... factory.shutdown(); factory = null; if (zkServer.isRunning()) { zkServer.shutdown(); } zkServer = null; {noformat} |
2446 | No Perforce job exists for this issue. | 0 | 32756 | 8 years, 50 weeks, 1 day ago | 0|i05yx3: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1046 | Creating a new sequential node results in a ZNODEEXISTS error |
Bug | Closed | Blocker | Fixed | Vishal Kher | Jeremy Stribling | Jeremy Stribling | 12/Apr/11 18:24 | 23/Nov/11 14:22 | 14/Jul/11 10:24 | 3.3.2, 3.3.3 | 3.3.4, 3.4.0 | server | 2 | 3 | A 3 node-cluster running Debian squeeze. | On several occasions, I've seen a create() with the sequential flag set fail with a ZNODEEXISTS error, and I don't think that should ever be possible. In past runs, I've been able to closely inspect the state of the system with the command line client, and saw that the parent znode's cversion is smaller than the sequential number of existing children znode under that parent. In one example: {noformat} [zk:<ip:port>(CONNECTED) 3] stat /zkrsm cZxid = 0x5 ctime = Mon Jan 17 18:28:19 PST 2011 mZxid = 0x5 mtime = Mon Jan 17 18:28:19 PST 2011 pZxid = 0x1d819 cversion = 120710 dataVersion = 0 aclVersion = 0 ephemeralOwner = 0x0 dataLength = 0 numChildren = 2955 {noformat} However, the znode /zkrsm/000000000000002d_record0000120804 existed on disk. In a recent run, I was able to capture the Zookeeper logs, and I will attach them to this JIRA. The logs are named as nodeX.<zxid_prefixes>.log, and each new log represents an application process restart. Here's the scenario: # There's a cluster with nodes 1,2,3 using zxid 0x3. # All three nodes restart, forming a cluster of zxid 0x4. # Node 3 restarts, leading to a cluster of 0x5. At this point, it seems like node 1 is the leader of the 0x5 epoch. In its log (node1.0x4-0x5.log) you can see the first (of many) instances of the following message: {noformat} 2011-04-11 21:16:12,607 16649 [ProcessThread:-1] INFO org.apache.zookeeper.server.PrepRequestProcessor - Got user-level KeeperException when processing sessionid:0x512f466bd44e0002 type:create cxid:0x4da376ab zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error Path:/zkrsm/00000000000000b2_record0001761440 Error:KeeperErrorCode = NodeExists for /zkrsm/00000000000000b2_record0001761440 {noformat} This then repeats forever as my application isn't expecting to ever get this error message on a sequential node create, and just continually retries. The message even transfers over to node3.0x5-0x6.log once the 0x6 epoch comes into play. I don't see anything terribly fishy in the transition between the epochs; the correct snapshots seem to be getting transferred, etc. Unfortunately I don't have a ZK snapshot/log that exhibits the problem when starting with a fresh system. Some oddities you might notice in these logs: * Between epochs 0x3 and 0x4, the zookeeper IDs of the nodes changed due to a bug in our application code. (They are assigned randomly, but are supposed to be consistent across restarts.) * We manage node membership dynamically, and our application restarts the ZooKeeperServer classes whenever a new node wants to join (without restarting the entire application process). This is why you'll see messages like the following in node1.0x4-0x5.log before a new election begins: {noformat} 2011-04-11 21:16:00,762 4804 [QuorumPeer:/0.0.0.0:2888] INFO org.apache.zookeeper.server.quorum.Learner - shutdown called {noformat} * There is in fact one of these dynamic membership changes in node1.0x4-0x5.log, just before the 0x4 epoch is formed. I'm not sure how this would be related though, as no transactions are done during this period. |
sequence | 47502 | No Perforce job exists for this issue. | 10 | 32757 | 8 years, 21 weeks, 6 days ago | sequential znodeexists | 0|i05yxb: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1045 | Support Quorum Peer mutual authentication via SASL |
New Feature | Closed | Critical | Fixed | Rakesh Radhakrishnan | Eugene Joseph Koontz | Eugene Joseph Koontz | 06/Apr/11 18:01 | 14/Jul/19 12:39 | 05/Dec/16 19:20 | 3.4.10 | quorum, security | 2 | 31 | ZOOKEEPER-2759, ZOOKEEPER-2689, ZOOKEEPER-2712, ZOOKEEPER-938, ZOOKEEPER-107, ZOOKEEPER-2433, ZOOKEEPER-2479, ZOOKEEPER-2650, ZOOKEEPER-2639 | ZOOKEEPER-938 addresses mutual authentication between clients and servers. This bug, on the other hand, is for authentication among quorum peers. Hopefully much of the work done on SASL integration with Zookeeper for ZOOKEEPER-938 can be used as a foundation for this enhancement. Review board: https://reviews.apache.org/r/47354/ |
2447 | No Perforce job exists for this issue. | 29 | 42068 | 35 weeks, 4 days ago |
Reviewed
|
0|i07kdj: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1044 | ZOOKEEPER-107 Allow dynamic changes to roles of a peer |
Sub-task | Resolved | Major | Fixed | Alexander Shraer | Vishal Kher | Vishal Kher | 04/Apr/11 14:27 | 13/Jun/16 10:20 | 23/May/14 14:14 | 3.3.0 | 3.5.0 | quorum | 2 | 10 | ZOOKEEPER-107 | Requirement: functionality that will reconfigure a OBSERVER to become a voting member and vice versa. Example of usage: 1. Maintain the Quorum size without changing the cluster size - in a 5 node cluster with 2 observers, I decide to decommission a voting member. Then, I would like to configure one of my observers to be a follower without any down time. 2. Added a new server to the cluster that has better resources than one of the voting peers. Make the new node as voting peer and the old one as observer. 3. Reduce the size of voting member for performance reasons. Fix to ZOOKEEPER-107 might automatically give us this functionality. It will be good to confirm that, and if needed, highlight work that might be needed in addition to ZOOKEEPER-107. |
2448 | No Perforce job exists for this issue. | 0 | 42069 | 3 years, 40 weeks, 3 days ago | 0|i07kdr: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1043 | Looped NPE at org.apache.zookeeper.server.NIOServerCnxn$Factory.run(NIOServerCnxn.java:244) |
Bug | Open | Major | Unresolved | Unassigned | César Álvarez Núñez | César Álvarez Núñez | 04/Apr/11 11:34 | 28/Aug/15 16:03 | 3.3.3, 3.4.6 | 2 | 8 | Sparc Solaris 10 and 11 Java 6u17 64 bits 5 nodes ensemble |
I'm sorry but I only have this log (which belongs to a "follower" node) and a previous message [Unexpected NodeCreated event after a reconnection.|http://mail-archives.apache.org/mod_mbox/zookeeper-user/201103.mbox/%3CAANLkTi=vmZ5v4W6FMhWg4XO6rJT89eGozGUE840bku0_@mail.gmail.com%3E] where I describe a potential side-effect at client side. {noformat} 2011-04-04 09:31:09,608 - INFO [Snapshot Thread:FileTxnSnapLog@208][] - Snapshotting: 1700527e36 2011-04-04 09:31:09,653 - INFO [SyncThread:1:FileTxnLog@197][] - Creating new log file: log.1700527e38 2011-04-04 10:13:39,287 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2301:NIOServerCnxn$Factory@251][] - Accepted socket connection from /XXX.XXX.XXX.69:1093 2011-04-04 10:13:39,371 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2301:NIOServerCnxn@777][] - Client attempting to establish new session at /XXX.XXX.XXX.69:1093 2011-04-04 10:13:39,376 - INFO [CommitProcessor:1:NIOServerCnxn@1580][] - Established session 0x12ee79c4a720022 with negotiated timeout 20000 for client /XXX.XXX.XXX.69:1093 2011-04-04 12:04:11,131 - INFO [SyncThread:1:FileTxnLog@197][] - Creating new log file: log.170053bf15 2011-04-04 12:04:11,131 - INFO [Snapshot Thread:FileTxnSnapLog@208][] - Snapshotting: 170053bf17 2011-04-04 12:13:10,779 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2301:NIOServerCnxn$Factory@251][] - Accepted socket connection from /XXX.XXX.XXX.63:1817 2011-04-04 12:13:10,790 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2301:NIOServerCnxn@777][] - Client attempting to establish new session at /XXX.XXX.XXX.63:1817 2011-04-04 12:13:10,794 - INFO [CommitProcessor:1:NIOServerCnxn@1580][] - Established session 0x12ee79c4a720023 with negotiated timeout 20000 for client /XXX.XXX.XXX.63:1817 2011-04-04 12:13:10,814 - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2301:NIOServerCnxn@634][] - EndOfStreamException: Unable to read additional data from client sessionid 0x12ee79c4a720023, likely client has closed socket 2011-04-04 12:13:10,816 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2301:NIOServerCnxn@1435][] - Closed socket connection for client /XXX.XXX.XXX.63:1817 which had sessionid 0x12ee79c4a720023 2011-04-04 12:13:10,839 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2301:NIOServerCnxn$Factory@251][] - Accepted socket connection from /XXX.XXX.XXX.63:1814 2011-04-04 12:13:10,840 - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2301:NIOServerCnxn$Factory@274][] - Ignoring exception java.net.SocketException: Invalid argument at sun.nio.ch.Net.setIntOption0(Native Method) at sun.nio.ch.Net.setIntOption(Unknown Source) at sun.nio.ch.SocketChannelImpl$1.setInt(Unknown Source) at sun.nio.ch.SocketOptsImpl.setBoolean(Unknown Source) at sun.nio.ch.SocketOptsImpl$IP$TCP.noDelay(Unknown Source) at sun.nio.ch.OptionAdaptor.setTcpNoDelay(Unknown Source) at sun.nio.ch.SocketAdaptor.setTcpNoDelay(Unknown Source) at org.apache.zookeeper.server.NIOServerCnxn.<init>(NIOServerCnxn.java:1367) at org.apache.zookeeper.server.NIOServerCnxn$Factory.createConnection(NIOServerCnxn.java:215) at org.apache.zookeeper.server.NIOServerCnxn$Factory.run(NIOServerCnxn.java:256) 2011-04-04 12:13:10,841 - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2301:NIOServerCnxn$Factory@272][] - Ignoring unexpected runtime exception java.lang.NullPointerException at org.apache.zookeeper.server.NIOServerCnxn$Factory.run(NIOServerCnxn.java:244) 2011-04-04 12:13:10,841 - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2301:NIOServerCnxn$Factory@272][] - Ignoring unexpected runtime exception java.lang.NullPointerException at org.apache.zookeeper.server.NIOServerCnxn$Factory.run(NIOServerCnxn.java:244) 2011-04-04 12:13:10,842 - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2301:NIOServerCnxn$Factory@272][] - Ignoring unexpected runtime exception java.lang.NullPointerException at org.apache.zookeeper.server.NIOServerCnxn$Factory.run(NIOServerCnxn.java:244) ... ... ... 2011-04-04 16:49:23,101 - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2301:NIOServerCnxn$Factory@272][] - Ignoring unexpected runtime exception java.lang.NullPointerException at org.apache.zookeeper.server.NIOServerCnxn$Factory.run(NIOServerCnxn.java:244) {noformat} |
2449 | No Perforce job exists for this issue. | 3 | 32758 | 4 years, 29 weeks, 6 days ago | 0|i05yxj: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1042 | ZOOKEEPER-1037 Generate zookeeper test jar for maven installation |
Sub-task | Closed | Major | Fixed | Ivan Kelly | Ivan Kelly | Ivan Kelly | 31/Mar/11 11:37 | 23/Nov/11 14:22 | 01/Apr/11 13:22 | 3.4.0 | contrib-bookkeeper, contrib-hedwig | 0 | 2 | Bookkeeper and hedwig both need access to zookeeper test classes. This JIRA is to provide that. | 47503 | No Perforce job exists for this issue. | 4 | 33339 | 8 years, 51 weeks, 5 days ago |
Reviewed
|
0|i062in: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1041 | ZOOKEEPER-1037 get hudson running on bookkeeper |
Sub-task | Resolved | Major | Implemented | Unassigned | Benjamin Reed | Benjamin Reed | 30/Mar/11 01:32 | 08/Oct/13 18:45 | 08/Oct/13 18:45 | contrib-bookkeeper, contrib-hedwig | 0 | 0 | setup hudson to run on bookkeeper code | 2450 | No Perforce job exists for this issue. | 0 | 42070 | 9 years, 1 day ago | 0|i07kdz: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1040 | ZOOKEEPER-1037 create bookkeeper webpage |
Sub-task | Resolved | Major | Fixed | Unassigned | Benjamin Reed | Benjamin Reed | 30/Mar/11 01:31 | 28/Apr/11 19:09 | 28/Apr/11 19:09 | contrib-bookkeeper, contrib-hedwig | 0 | 0 | create a webpage for bookkeeper | 47504 | No Perforce job exists for this issue. | 0 | 33340 | 9 years, 1 day ago | 0|i062iv: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1039 | ZOOKEEPER-1037 give bookkeeper committers access to bookkeeper svn |
Sub-task | Resolved | Major | Fixed | Unassigned | Benjamin Reed | Benjamin Reed | 30/Mar/11 01:30 | 28/Apr/11 19:09 | 28/Apr/11 19:09 | contrib-bookkeeper, contrib-hedwig | 0 | 0 | need to give ivan, utkarsh, and dhruba svn access to bookkeeper svn | 47505 | No Perforce job exists for this issue. | 0 | 33341 | 9 years, 1 day ago | 0|i062j3: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1038 | ZOOKEEPER-1037 Move bookkeeper and hedwig code in subversion |
Sub-task | Resolved | Major | Fixed | Unassigned | Benjamin Reed | Benjamin Reed | 30/Mar/11 01:28 | 05/Apr/11 16:05 | 05/Apr/11 16:05 | contrib-bookkeeper, contrib-hedwig | 0 | 0 | need to do an svn move of the hedwig and bookkeeper code to the bookkeeper subversion | 47506 | No Perforce job exists for this issue. | 0 | 33342 | 8 years, 51 weeks, 6 days ago | 0|i062jb: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1037 | Create BookKeeper subproject |
Task | Resolved | Major | Implemented | Unassigned | Benjamin Reed | Benjamin Reed | 30/Mar/11 01:27 | 08/Oct/13 18:45 | 08/Oct/13 18:45 | contrib-bookkeeper, contrib-hedwig | 0 | 0 | ZOOKEEPER-1038, ZOOKEEPER-1039, ZOOKEEPER-1040, ZOOKEEPER-1041, ZOOKEEPER-1042 | move the hedwig and bookkeeper code to the bookkeeper subproject | 2451 | No Perforce job exists for this issue. | 0 | 42071 | 9 years, 1 day ago | 0|i07ke7: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1036 | send UPTODATE to follower until a quorum of servers synced with leader |
Bug | Resolved | Major | Not A Problem | Unassigned | jiangwen wei | jiangwen wei | 28/Mar/11 20:46 | 31/Mar/11 16:26 | 31/Mar/11 16:26 | server | 0 | 0 | 1. current process when leader fail, a new leader will be elected, followers will sync with the new leader. After synced, leader send UPTODATE to follower. 2. a corner case but there is a corner case, things will go wrong. suppose message M only exists on leader, after a follower synced with leader, the client connected to the follower will see M. but it only exists on two servers, not on a quorum of servers. If the new leader and the follower failed, message M is lost, but M is already seen by client. 3. one solution So I think UPTODATE can be sent to follower only when a quorum of server synced with the leader. |
214215 | No Perforce job exists for this issue. | 0 | 32759 | 9 years ago | 0|i05yxr: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1035 | CREATOR_ALL_ACL does not work together with IPAuthenticationProvider |
Bug | Open | Major | Unresolved | Unassigned | Christian Ziech | Christian Ziech | 28/Mar/11 04:36 | 07/Sep/11 10:49 | 3.3.1, 3.3.2 | server | 0 | 2 | ZOOKEEPER-1173 | We were trying to use the predefined ACL "Ids.CREATOR_ALL_ACL" together with the default ip authentication. Unfortunately it seems that this cannot work due to the implementation of the PrepRequestProcessor.fixupACL() method checking the return value of the AuthenticationProvider.isAuthenticated() (the IPAuthenticationProvider in our case) method. Unfortunately this provider always returns false which results in the Ids.CREATOR_ALL_ACL to be always rejected. |
2452 | No Perforce job exists for this issue. | 0 | 32760 | 9 years, 3 days ago | 0|i05yxz: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1034 | perl bindings should automatically find the zookeeper c-client headers |
Bug | Closed | Minor | Fixed | Nicholas Harteau | Nicholas Harteau | Nicholas Harteau | 27/Mar/11 16:45 | 23/Nov/11 14:22 | 14/Aug/11 21:41 | 3.3.3 | 3.4.0 | contrib | 0 | 2 | Installing Net::ZooKeeper from cpan or the zookeeper distribution tarballs will always fail due to not finding c-client header files. In conjunction with ZOOKEEPER-1033 update perl bindings to look for c-client header files in INCDIR/zookeeper/ a.k.a. make installs of Net::ZooKeeper via cpan/cpanm/whatever *just work*, assuming you've already got the zookeeper c client installed. |
47507 | No Perforce job exists for this issue. | 4 | 32761 | 8 years, 32 weeks, 3 days ago | Net::ZooKeeper now looks in some sane places for the c client includes |
Reviewed
|
0|i05yy7: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1033 | c client should install includes into INCDIR/zookeeper, not INCDIR/c-client-src |
Bug | Closed | Minor | Fixed | Nicholas Harteau | Nicholas Harteau | Nicholas Harteau | 27/Mar/11 16:40 | 23/Nov/11 14:22 | 04/May/11 02:03 | 3.3.3 | 3.4.0 | c client | 0 | 2 | ZOOKEEPER-494 | header files are installed into foo/include/c-client-src/, which doesn't indicate a relationship with zookeeper and doesn't correspond to foo/lib/libzookeeper* header files should be installed into foo/include/zookeeper/ as this is the common practice. |
47508 | No Perforce job exists for this issue. | 1 | 32762 | 8 years, 47 weeks, 1 day ago | Install c-client header files into include/zookeeper/ rather than include/c-client-src/ | 0|i05yyf: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1032 | speed up recovery from leader failure |
Improvement | Open | Major | Unresolved | Unassigned | jiangwen wei | jiangwen wei | 27/Mar/11 06:03 | 05/Feb/20 07:16 | 3.7.0, 3.5.8 | server | 1 | 4 | when the number of nodes is large, it may take a long time to recover from leader failure there are some points to improve: 1. Follower should take snapshot asynchronously when follower up to date 2. Currently Leader/Follower will clear the DataTree on leader failures, and then restore it from a snapshot and transaction logs. DataTree should not be cleared, only restore it from transaction logs. 3. FileTxnLog should store recently transaction logs in memory, so when DataTree is not behind the transaction logs a lot, the transaction logs in memory can be used to restore DataTree. |
2453 | No Perforce job exists for this issue. | 0 | 42072 | 9 years, 3 days ago | 0|i07kef: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1031 | Introduce virtual cluster IP and start that cluster IP on the host running ZK leader |
Wish | Open | Minor | Unresolved | Unassigned | Vishal Kher | Vishal Kher | 25/Mar/11 22:29 | 28/May/15 01:36 | 3.3.3 | 4.0.0 | leaderElection, quorum | 2 | 6 | It would be useful to enable a way to specify a virtual (floating) IP for the ZK cluster (say in zoo.cfg). The ZK leader will start this IP on one of its interfaces. If the leadership changes, the cluster IP will be taken over by the new leader. This IP can be used to identify the ZK leader and send administrative commands/query to the leader. For example, - a ZK client can get the list of ZK servers in the configuration by sending a request to the server running this IP address. The client just needs to know one IP address. Availability of cluster automatically ensures availability of the IP address. - To reconfigure ZK configuration, a client can send reconfig request to the server on this IP and keep retrying until the request succeeds or fails. Implementation issues: 1. The old ZK leader that has lost leadership should be able to somehow give up the virtual IP address. Otherwise, it could lead to collisions. One solution is to self reboot. A system property can be used to specify ways to unplumb the cluster IP 2. Cross-platform support. 3. Refreshing ARP caches |
2454 | No Perforce job exists for this issue. | 0 | 42073 | 4 years, 43 weeks ago | 0|i07ken: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1030 | Increase default for maxClientCnxns |
Improvement | Closed | Trivial | Fixed | Todd Lipcon | Todd Lipcon | Todd Lipcon | 25/Mar/11 18:37 | 04/Sep/14 21:26 | 08/Apr/11 19:41 | 3.2.2 | 3.4.0 | 0 | 3 | ZOOKEEPER-1073 | The default for maxClientCnxns is 10, which is too low for many applications. For example, HBase users often run MR jobs where each task needs to use ZooKeeper to talk to HBase. This means that each slot on the tasktracker will have at least one ZK connection. With today's beefy machines, that's easily 20+ connections per node. I would suggest bumping the default to 60, which will still protect against runaway nodes (eg a leak in a tight loop) but won't impact MR jobs that need to talk to ZK. |
37454 | No Perforce job exists for this issue. | 2 | 30003 | 5 years, 28 weeks, 6 days ago |
Reviewed
|
0|i05hxr: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1029 | C client bug in zookeeper_init (if bad hostname is given) |
Bug | Closed | Blocker | Fixed | Flavio Paiva Junqueira | Dheeraj Agrawal | Dheeraj Agrawal | 25/Mar/11 16:47 | 25/Dec/18 04:42 | 11/Dec/15 15:15 | 3.3.2, 3.4.6, 3.5.0 | 3.4.7, 3.5.2, 3.6.0 | c client | 3 | 18 | ZOOKEEPER-2443 | If you give invalid hostname to zookeeper_init method, it's not able to resolve it, and it tries to do the cleanup (free buffer/completion lists/etc) . The adaptor_init() is not called for this code path, so the lock,cond variables (for adaptor, completion lists) are not initialized. As part of the cleanup it's trying to clean up some buffers and acquires locks and unlocks (where the locks have not yet been initialized, so unlocking fails) lock_completion_list(&zh->sent_requests); - pthread_mutex/cond not initialized tmp_list = zh->sent_requests; zh->sent_requests.head = 0; zh->sent_requests.last = 0; unlock_completion_list(&zh->sent_requests); trying to broadcast here on uninitialized cond It should do error checking to see if locking succeeds before unlocking it. If Locking fails, then appropriate error handling has to be done. |
2455 | No Perforce job exists for this issue. | 8 | 32763 | 1 year, 12 weeks, 2 days ago |
Reviewed
|
c client, adaptor_init, zookeeper_init, bad hostname | 0|i05yyn: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1028 | In python bindings, zookeeper.set2() should return a stat dict but instead returns None |
Bug | Closed | Minor | Fixed | Chris Medaglia | Chris Medaglia | Chris Medaglia | 24/Mar/11 16:26 | 23/Nov/11 14:22 | 06/Apr/11 16:22 | 3.3.3 | 3.4.0 | contrib-bindings | 0 | 3 | 3600 | 3600 | 0% | All environments. | There is a small bug in the python bindings, specifically with the zookeeper.set2() call. This method should return a stat dictionary, but actually returns None. The fix is a one-character change to zookeeper.c such that the return value is '&stat' rather than 'stat'. | 0% | 0% | 3600 | 3600 | patch | 47509 | No Perforce job exists for this issue. | 2 | 32764 | 8 years, 51 weeks ago |
Reviewed
|
0|i05yyv: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1027 | chroot not transparent in zoo_create() |
Bug | Closed | Critical | Fixed | Thijs Terlouw | Thijs Terlouw | Thijs Terlouw | 24/Mar/11 01:06 | 28/Sep/15 13:33 | 25/Jul/11 13:45 | 3.3.3 | 3.4.0 | c client | 0 | 5 | ZOOKEEPER-1150 | ZOOKEEPER-995, ZOOKEEPER-2282 | Linux, ZooKeeper 3.3.3, C-client, java 1.6.0_17-b04, hotspot server vm | I've recently started to use the chroot functionality (introduced in 3.2.0) as part of my connect string.It mostly works as expected, but there is one case that is unexpected: when I create a path with zoo_create() I can retrieve the created path. This is very useful when you set the ZOO_SEQUENCE flag. Unfortunately the returned path includes the chroot as part of the path. This was unexpected to me: I expected that the chroot would be totally transparent. The documentation for zoo_create() says: "path_buffer : Buffer which will be filled with the path of the new node (this might be different than the supplied path because of the ZOO_SEQUENCE flag)." This gave me the impression that this flag is the only reason the returned path is different from the created path, but apparently it's not. Is this a bug or intended behavior? I workaround this issue now by remembering the chroot in my wrapper code and after a call to zoo_create() i check if the returned path starts with the chroot. If it does, I remove it. My use case is to create a path with a sequence number and then delete this path later. Unfortunately I cannot delete the path because it has the chroot prepended to it, and thus it will result in two chroots. I believe this only affects the create functions. |
47510 | No Perforce job exists for this issue. | 5 | 32765 | 8 years, 33 weeks, 1 day ago | Correctly removes the chroot from the returned path in a call to zoo_create() |
Reviewed
|
chroot zookeeper zoo_create | 0|i05yz3: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1026 | Sequence number assignment decreases after old node rejoins cluster |
Bug | Open | Major | Unresolved | Unassigned | Jeremy Stribling | Jeremy Stribling | 22/Mar/11 13:16 | 25/Mar/11 14:25 | 3.3.3 | server | 0 | 1 | I ran into a weird case where a Zookeeper server rejoins the cluster after missing several operations, and then a client creates a new sequential node that has a number earlier than the last node it created. I don't have full logs, or a live system in this state, or any data directories, just some partial server logs and the evidence as seen by the client. Haven't tried reproducing it yet, just wanted to see if anyone here had any ideas. Here's the scenario (probably more info than necessary, but trying to be complete) 1) Initially (5:37:20): 3 nodes up, with ids 215, 126, and 37 (called nodes #1, #2, and #3 below): 2) Nodes periodically (and throughout this whole timeline) create sequential, non-ephemeral nodes under the /zkrsm parent node. 3) 5:46:57: Node #1 gets notified of /zkrsm/0000000000000000_record0000002116 4) 5:47:06: Node #1 restarts and rejoins 5) 5:49:26: Node #2 gets notified of /zkrsm/0000000000000000_record0000002708 6) 5:49:29: Node #2 restarts and rejoins 7) 5:52:01: Node #3 gets notified of /zkrsm/0000000000000000_record0000003291 8) 5:52:02: Node #3 restarts and begins the rejoining process 9) 5:52:08: Node #1 successfully creates /zkrsm/0000000000000000_record0000003348 10) 5:52:08: Node #2 dies after getting notified of /zkrsm/0000000000000000_record0000003348 11) 5:52:10ish: Node #3 is elected leader (the ZK server log doesn't have wallclock timestamps, so not exactly sure on the ordering of this step) 12) 5:52:15: Node #1 successfully creates /zkrsm/0000000000000000_record0000003292 Note that the node created in step #12 is lower than the one created in step #9, and is exactly one greater than the last node seen by node #3 before it restarted. Here is the sequence of session establishments as seen from the C client of node #1 after its restart (the IP address of node #1=13.0.0.11, #2=13.0.0.12, #3=13.0.0.13): 2011-03-18 05:46:59,838:17454(0x7fc57d3db710):ZOO_INFO@check_events@1632: session establishment complete on server [13.0.0.13:2888], sessionId=0x252ec780a3020000, negotiated timeout=6000 2011-03-18 05:49:32,194:17454(0x7fc57cbda710):ZOO_INFO@check_events@1632: session establishment complete on server [13.0.0.13:2888], sessionId=0x252ec782f5100002, negotiated timeout=6000 2011-03-18 05:52:02,352:17454(0x7fc57d3db710):ZOO_INFO@check_events@1632: session establishment complete on server [13.0.0.12:2888], sessionId=0x7e2ec782ff5f0001, negotiated timeout=6000 2011-03-18 05:52:08,583:17454(0x7fc57d3db710):ZOO_INFO@check_events@1632: session establishment complete on server [13.0.0.11:2888], sessionId=0x7e2ec782ff5f0001, negotiated timeout=6000 2011-03-18 05:52:13,834:17454(0x7fc57cbda710):ZOO_INFO@check_events@1632: session establishment complete on server [13.0.0.11:2888], sessionId=0xd72ec7856d0f0001, negotiated timeout=6000 I will attach logs for all nodes after each of their restarts, and a partial log for node #3 from before its restart. |
2456 | No Perforce job exists for this issue. | 1 | 32766 | 9 years, 6 days ago | sequential | 0|i05yzb: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1025 | zkCli is overly sensitive to to spaces. |
Improvement | Closed | Major | Fixed | Laxman | Jonathan Hsieh | Jonathan Hsieh | 21/Mar/11 20:40 | 23/Nov/11 14:22 | 18/Aug/11 16:20 | 3.3.3, 3.4.0 | 3.4.0 | java client | 0 | 2 | ZOOKEEPER-997 | Here's an example: I do an ls to get znode names. I try to stat a znode. {code} [zk: localhost:3181(CONNECTED) 1] ls /flume-nodes [nodes0000000002, nodes0000000001, nodes0000000000, nodes0000000005, nodes0000000004, nodes0000000003] [zk: localhost:3181(CONNECTED) 3] stat /flume-nodes/nodes0000000002 cZxid = 0xb ctime = Sun Mar 20 23:24:03 PDT 2011 ... (success) {code} Here's something that almost looks the same. Notice the extra space infront of the znode name. {code} [zk: localhost:3181(CONNECTED) 2] stat /flume-nodes/nodes0000000002 Command failed: java.lang.IllegalArgumentException: Path length must be > 0 {code} This seems like unexpected behavior. |
40928 | No Perforce job exists for this issue. | 2 | 33343 | 8 years, 31 weeks, 6 days ago | 0|i062jj: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1024 | let path be binary |
New Feature | Open | Major | Unresolved | Unassigned | jiangwen wei | jiangwen wei | 19/Mar/11 06:20 | 02/May/11 23:42 | server | 0 | 1 | let path be binary, not string. there are overhead to hold string. the overhead is obvious when there are millions of nodes. some time ZK can be used as a highly available meta database. some data are binary, if converting to string, there is also obvious overhead. |
2457 | No Perforce job exists for this issue. | 0 | 42074 | 8 years, 47 weeks, 2 days ago | 0|i07kev: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1023 | zkpython: add_auth can deadlock the interpreter |
Bug | Open | Minor | Unresolved | Unassigned | Botond Hejj | Botond Hejj | 19/Mar/11 06:20 | 05/Feb/20 07:16 | 3.3.2 | 3.7.0, 3.5.8 | contrib-bindings | 0 | 0 | If the add_auth method has a callback and we execute another command just after it than we can deadlock the python api. Example: def deadlock(a, b): pass def watcher(zh, type, state, path): if(state == zookeeper.CONNECTED_STATE): zookeeper.add_auth(zh, 'test', 'test', deadlock) zookeeper.get_children(zh, '/') zh = zookeeper.init("host:port", watcher) Looking at the code the problem looks like the following: get_children sync call is running on the main thread and have the GIL it blocks until the get_children finished. Meantime on the other thread the callback of add_auth is called and that tries to get the GIL to call the python callback. So this thread is waiting for the main thread to release the GIL but the main thread is waiting for the other thread to process the reply of get_children. I am not an expert on python binding but I think it can be solved if the GIL would be release before synchronous c api calls. |
2458 | No Perforce job exists for this issue. | 1 | 32767 | 9 years, 2 days ago | 0|i05yzj: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1022 | let the children under a ZNode in order. |
New Feature | Open | Major | Unresolved | Unassigned | jiangwen wei | jiangwen wei | 19/Mar/11 06:13 | 29/Apr/11 11:34 | server | 2 | 3 | ZOOKEEPER-423 | let the children under a ZNode in order. and user can specify a comparator for each parent ZNode. some time we only need get some children, not all, like getting first children. and some application can leverage the order, like in HBase, the meta table can put into ZK. |
2459 | No Perforce job exists for this issue. | 0 | 42075 | 9 years, 2 days ago | 0|i07kf3: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1020 | Implement function in C client to determine which host you're currently connected to. |
New Feature | Closed | Minor | Fixed | Stephen Tyree | Stephen Tyree | Stephen Tyree | 15/Mar/11 10:39 | 23/Nov/11 14:22 | 16/Mar/11 21:01 | 3.4.0 | c client | 0 | 0 | On occasion it might be useful to determine which host your Zookeeper client is currently connected to, be it for debugging purposes or otherwise. A possible signature for that function: const char* zoo_get_connected_host(zhandle_t *zh, char *buffer, size_t buffer_size, unsigned short *port); Clients could use it like below: char buffer[33]; unsigned short port = 0; if (!zoo_get_connected_host(zh, buffer, sizeof(buffer), &port)) return EXIT_FAILURE; printf("The connected host is: %s:%d\n", buffer, port); |
47511 | No Perforce job exists for this issue. | 1 | 33344 | 9 years, 2 weeks ago |
Reviewed
|
0|i062jr: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1019 | zkfuse doesn't list dependency on boost in README |
Improvement | Closed | Major | Fixed | Raúl Gutiérrez Segalés | Karel Vervaeke | Karel Vervaeke | 15/Mar/11 09:48 | 13/Mar/14 14:17 | 10/Dec/13 15:45 | 3.4.0 | 3.4.6, 3.5.0 | contrib | 0 | 5 | 300 | 300 | 0% | The README.txt under contrib/fuse doesn't list boost under Development build libraries< | 0% | 0% | 300 | 300 | 2460 | No Perforce job exists for this issue. | 1 | 42076 | 6 years, 2 weeks ago | 0|i07kfb: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1018 | The connection permutation in get_addrs uses a weak and inefficient shuffle |
Improvement | Closed | Minor | Fixed | Stephen Tyree | Stephen Tyree | Stephen Tyree | 15/Mar/11 08:47 | 23/Nov/11 14:22 | 04/Apr/11 17:09 | 3.3.2 | 3.4.0 | c client | 0 | 0 | 7200 | 7200 | 0% | ZOOKEEPER-989 | After determining all of the addresses in the get_addrs function in the C client, the connection is permuted using the following code: setup_random(); /* Permute */ for(i = 0; i < zh->addrs_count; i++) { struct sockaddr_storage *s1 = zh->addrs + random()%zh->addrs_count; struct sockaddr_storage *s2 = zh->addrs + random()%zh->addrs_count; if (s1 != s2) { struct sockaddr_storage t = *s1; *s1 = *s2; *s2 = t; } } Not only does this shuffle produce an uneven permutation, but it is half as efficient as the Fisher-Yates shuffle which produces an unbiased one. It seems like it would be a simple fix to increase the randomness and efficiency of the shuffle by switching over to using Fisher-Yates. |
0% | 0% | 7200 | 7200 | 47512 | No Perforce job exists for this issue. | 1 | 33345 | 8 years, 51 weeks, 2 days ago |
Reviewed
|
0|i062jz: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1017 | Follower.followLeader throws SocketException, then shutdown Follower |
Bug | Open | Major | Unresolved | Unassigned | tom liu | tom liu | 15/Mar/11 05:59 | 15/Mar/11 05:59 | 3.3.3 | quorum | 0 | 1 | JDK1.6.0_17/CentOS5.5 | i use three node to deploy zkcluster. but follower node throws SocketException twice every day. 2011-03-15 14:15:48,260 - WARN [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Follower@90] - Exception when following the leader java.net.SocketException: Broken pipe at java.net.SocketOutputStream.socketWrite0(Native Method) at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92) at java.net.SocketOutputStream.write(SocketOutputStream.java:136) at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123) at org.apache.zookeeper.server.quorum.Learner.writePacket(Learner.java:126) at org.apache.zookeeper.server.quorum.Learner.ping(Learner.java:361) at org.apache.zookeeper.server.quorum.Follower.processPacket(Follower.java:116) at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:80) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:644) i found the reason is that Follower do not reponse Leader's Ping just on time. so, i add some logs. finnally, i found that, in org.apache.zookeeper.server.SyncRequestProcessor: {noformat} public void processRequest(Request request) { // request.addRQRec(">sync"); //TODO tom liu added if(LOG.isDebugEnabled()) { LOG.debug("Processing request::" + request); } queuedRequests.add(request); //TODO tom liu added if(LOG.isDebugEnabled()) { LOG.debug("Processing request::" + request); } } {noformat} that log is: 2011-03-15 14:15:34,515 - DEBUG [QuorumPeer:/0:0:0:0:0:0:0:0:2181:SyncRequestProcessor@189] - Processing request::sessionid:0x22e9907b5d50000 type:setData cxid:0x70b55 zxid:0xd50000a73f txntype:5 reqpath:n/a 2011-03-15 14:15:48,259 - DEBUG [QuorumPeer:/0:0:0:0:0:0:0:0:2181:SyncRequestProcessor@194] - Processing request::sessionid:0x22e9907b5d50000 type:setData cxid:0x70b55 zxid:0xd50000a73f txntype:5 reqpath:n/a so: elapsed time=13744, LearnerHandler's ia.readRecord TimeOut on run method, then Leader shutdown, and re-elect Leader process. my question is: why the queuedRequests.add statement take so long time? |
2461 | No Perforce job exists for this issue. | 0 | 32768 | 9 years, 2 weeks, 2 days ago | 0|i05yzr: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1015 | DateFormat.getDateTimeInstance() is very expensive, we can cache it to improve performance |
Bug | Patch Available | Major | Unresolved | Bill Havanki | Xiaoming Shi | Xiaoming Shi | 12/Mar/11 22:00 | 02/Mar/16 21:44 | 3.3.2 | server | 0 | 2 | ZOOKEEPER-1014 | In the file {noformat} ./zookeeper-3.3.2/src/java/main/org/apache/zookeeper/server/PurgeTxnLog.java line:103 {noformat} DateFormat.getDateTimeInstance() is called many times in the for loop. We can cache the result and improve the performance This is similar to the Apache bug https://issues.apache.org/bugzilla/show_bug.cgi?id=48778 Similar code can be found: {noformat} ./zookeeper-3.3.2/src/java/main/org/apache/zookeeper/server/TraceFormatter.java ./zookeeper-3.3.2/src/java/main/org/apache/zookeeper/server/LogFormatter.java {noformat} |
newbie | 2463 | No Perforce job exists for this issue. | 1 | 32769 | 4 years, 3 weeks ago | 0|i05yzz: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1014 | DateFormat.getDateTimeInstance() is very expensive, we can cache it to improve performance |
Bug | Resolved | Major | Duplicate | Unassigned | Xiaoming Shi | Xiaoming Shi | 12/Mar/11 12:42 | 14/Mar/11 23:51 | 14/Mar/11 23:51 | 3.3.2 | server | 0 | 0 | ZOOKEEPER-1015 | In the file: {noformat} ./zookeeper-3.3.2/src/java/main/org/apache/zookeeper/server/TraceFormatter.java {noformat} DateFormat.getDateTimeInstance() is called in the while loop. We can cache the return value, and improve performance. This is similar to the Apache Bug https://issues.apache.org/bugzilla/show_bug.cgi?id=48778 |
214214 | No Perforce job exists for this issue. | 0 | 32770 | 9 years, 2 weeks, 2 days ago | 0|i05z07: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1013 | zkServer.sh usage message should mention all startup options |
Bug | Closed | Trivial | Fixed | Eugene Joseph Koontz | Eugene Joseph Koontz | Eugene Joseph Koontz | 11/Mar/11 15:28 | 23/Nov/11 14:22 | 15/Mar/11 14:39 | 3.4.0 | server | 0 | 1 | 300 | 300 | 0% | currently the "Usage" message for zkServer shows: echo "Usage: $0 {start|stop|restart|status}" But it seems to me that it should show the other startup options as well, which are currently: start-foreground, upgrade, print-cmd. |
0% | 0% | 300 | 300 | 47513 | No Perforce job exists for this issue. | 1 | 32771 | 9 years, 2 weeks, 1 day ago | patch to zkServer.sh to show all startup options | 0|i05z0f: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1012 | support distinct JVMFLAGS for zookeeper server in zkServer.sh and zookeeper client in zkCli.sh |
New Feature | Closed | Trivial | Fixed | Eugene Joseph Koontz | Eugene Joseph Koontz | Eugene Joseph Koontz | 11/Mar/11 15:12 | 26/Jan/12 20:58 | 16/Mar/11 13:17 | 3.4.0 | server | 0 | 0 | 300 | 300 | 0% | ZOOKEEPER-1376 | 1. Sometimes you might want to run zkServer.sh with different JVMFLAGS than for clients. Make zkServer.sh consult the SERVER_JVMFLAGS variable and, if it exists, add it to the beginning of the existing JVMFLAGS setting. 2. Sometimes you might want to run zkCli.sh with different JVMFLAGS than for servers. Make zkCli.sh consult the CLIENT_JVMFLAGS variable and, if it exists, add it to the beginning of the existing JVMFLAGS setting. |
0% | 0% | 300 | 300 | 47514 | No Perforce job exists for this issue. | 1 | 33346 | 9 years, 2 weeks ago |
Reviewed
|
0|i062k7: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1011 | fix Java Barrier Documentation example's race condition issue and polish up the Barrier Documentation |
Bug | Open | Major | Unresolved | maoling | Semih Salihoglu | Semih Salihoglu | 09/Mar/11 04:49 | 20/Jan/19 07:12 | documentation | 1 | 6 | 0 | 3000 | There is a race condition in the Barrier example of the java doc: http://hadoop.apache.org/zookeeper/docs/current/zookeeperTutorial.html. It's in the enter() method. Here's the original example: boolean enter() throws KeeperException, InterruptedException{ zk.create(root + "/" + name, new byte[0], Ids.OPEN_ACL_UNSAFE, CreateMode.EPHEMERAL_SEQUENTIAL); while (true) { synchronized (mutex) { List<String> list = zk.getChildren(root, true); if (list.size() < size) { mutex.wait(); } else { return true; } } } } Here's the race condition scenario: Let's say there are two machines/nodes: node1 and node2 that will use this code to synchronize over ZK. Let's say the following steps take place: node1 calls the zk.create method and then reads the number of children, and sees that it's 1 and starts waiting. node2 calls the zk.create method (doesn't call the zk.getChildren method yet, let's say it's very slow) node1 is notified that the number of children on the znode changed, it checks that the size is 2 so it leaves the barrier, it does its work and then leaves the barrier, deleting its node. node2 calls zk.getChildren and because node1 has already left, it sees that the number of children is equal to 1. Since node1 will never enter the barrier again, it will keep waiting. --- End of scenario --- Here's Flavio's fix suggestions (copying from the email thread): ... I see two possible action points out of this discussion: 1- State clearly in the beginning that the example discussed is not correct under the assumption that a process may finish the computation before another has started, and the example is there for illustration purposes; 2- Have another example following the current one that discusses the problem and shows how to fix it. This is an interesting option that illustrates how one could reason about a solution when developing with zookeeper. ... We'll go with the 2nd option. |
100% | 100% | 3000 | 0 | pull-request-available | 2464 | No Perforce job exists for this issue. | 0 | 32772 | 1 year, 17 weeks ago | 0|i05z0n: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1010 | ZOOKEEPER-850 Remove or move ManagedUtil to contrib, because it has direct log4j dependencies |
Sub-task | Resolved | Major | Duplicate | Unassigned | Olaf Krische | Olaf Krische | 08/Mar/11 16:37 | 25/Apr/12 19:48 | 25/Apr/12 19:48 | 3.3.1 | java client | 1 | 2 | Please move ManagedUtil out of the way. It has direct dependencies on log4j api. | 2465 | No Perforce job exists for this issue. | 0 | 33347 | 7 years, 48 weeks, 1 day ago | 0|i062kf: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1008 | ZK should give more specific error on missing myid |
Improvement | Open | Minor | Unresolved | Unassigned | Eric Sammer | Eric Sammer | 07/Mar/11 12:48 | 05/Sep/11 07:50 | 3.3.2 | server | 0 | 1 | On startup, ZK should specifically test for and provide an error message if the myid file is missing. Currently, the error message is simply "Invalid config" if myid is missing. | 2466 | No Perforce job exists for this issue. | 0 | 42078 | 8 years, 29 weeks, 3 days ago | 0|i07kfr: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1007 | iarchive leak in C client |
Bug | Closed | Minor | Fixed | Jeremy Stribling | Jeremy Stribling | Jeremy Stribling | 04/Mar/11 16:42 | 23/Nov/11 14:22 | 15/Mar/11 16:42 | 3.3.3 | 3.4.0 | c client | 0 | 1 | On line 1957, zookeeper_process() returns without cleaning up the "ia" buffer that was previously allocated. I don't know how often this code path is taken, but I thought it was worth reporting. I will attach a simple patch shortly. | 47515 | No Perforce job exists for this issue. | 2 | 32773 | 9 years, 2 weeks, 1 day ago |
Reviewed
|
0|i05z0v: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1006 | QuorumPeer "Address already in use" -- regression in 3.3.3 |
Bug | Closed | Minor | Fixed | Patrick D. Hunt | Patrick D. Hunt | Patrick D. Hunt | 03/Mar/11 12:38 | 23/Nov/11 14:22 | 27/Jul/11 13:21 | 3.3.3 | 3.3.4, 3.4.0 | tests | 0 | 1 | CnxManagerTest.testWorkerThreads See attachment, this is the first time I've seen this test fail, and it's failed 2 out of the last three test runs. Notice (attachment) once this happens the port never becomes available. {noformat} 2011-03-02 15:53:12,425 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11245:NIOServerCnxn$Factory@251] - Accepted socket connection from /172.29.6.162:51441 2011-03-02 15:53:12,430 - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11245:NIOServerCnxn@639] - Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running 2011-03-02 15:53:12,430 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11245:NIOServerCnxn@1435] - Closed socket connection for client /172.29.6.162:51441 (no session established for client) 2011-03-02 15:53:12,430 - WARN [QuorumPeer:/0:0:0:0:0:0:0:0:11241:Follower@82] - Exception when following the leader java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:375) at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:84) at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108) at org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:148) at org.apache.zookeeper.server.quorum.Learner.registerWithLeader(Learner.java:267) at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:66) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:645) 2011-03-02 15:53:12,431 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:11241:Follower@165] - shutdown called java.lang.Exception: shutdown Follower at org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:165) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:649) 2011-03-02 15:53:12,432 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:11241:QuorumPeer@621] - LOOKING 2011-03-02 15:53:12,432 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:11241:FastLeaderElection@663] - New election. My id = 0, Proposed zxid = 0 2011-03-02 15:53:12,433 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 0 (n.leader), 0 (n.zxid), 2 (n.round), LOOKING (n.state), 0 (n.sid), LOOKING (my state) 2011-03-02 15:53:12,433 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 0 (n.leader), 0 (n.zxid), 2 (n.round), LOOKING (n.state), 0 (n.sid), LOOKING (my state) 2011-03-02 15:53:12,433 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 0 (n.leader), 0 (n.zxid), 2 (n.round), LOOKING (n.state), 0 (n.sid), LOOKING (my state) 2011-03-02 15:53:12,633 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 0 (n.leader), 0 (n.zxid), 2 (n.round), LOOKING (n.state), 0 (n.sid), LOOKING (my state) 2011-03-02 15:53:12,633 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:11245:QuorumPeer@655] - LEADING 2011-03-02 15:53:12,636 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:11245:Leader@54] - TCP NoDelay set to: true 2011-03-02 15:53:12,638 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:11245:ZooKeeperServer@151] - Created server with tickTime 1000 minSessionTimeout 2000 maxSessionTimeout 20000 datadir /var/lib/hudson/workspace/CDH3-ZooKeeper-3.3.3_sles/build/test/tmp/test9001250572426375869.junit.dir/version-2 snapdir /var/lib/hudson/workspace/CDH3-ZooKeeper-3.3.3_sles/build/test/tmp/test9001250572426375869.junit.dir/version-2 2011-03-02 15:53:12,639 - ERROR [QuorumPeer:/0:0:0:0:0:0:0:0:11245:Leader@133] - Couldn't bind to port 11245 java.net.BindException: Address already in use at java.net.PlainSocketImpl.socketBind(Native Method) at java.net.PlainSocketImpl.bind(PlainSocketImpl.java:365) at java.net.ServerSocket.bind(ServerSocket.java:319) at java.net.ServerSocket.<init>(ServerSocket.java:185) at java.net.ServerSocket.<init>(ServerSocket.java:97) at org.apache.zookeeper.server.quorum.Leader.<init>(Leader.java:131) at org.apache.zookeeper.server.quorum.QuorumPeer.makeLeader(QuorumPeer.java:512) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:657) {noformat} |
47516 | No Perforce job exists for this issue. | 4 | 32774 | 8 years, 35 weeks, 1 day ago | turns out this is a bug in the test, the supplied patch fixes the problem by using polling rather than straight sleep. |
Reviewed
|
0|i05z13: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1005 | Zookeeper servers fail to elect a leader succesfully. |
Bug | Open | Major | Unresolved | Unassigned | Alexandre Hardy | Alexandre Hardy | 01/Mar/11 11:00 | 05/Feb/20 07:16 | 3.2.2 | 3.7.0, 3.5.8 | quorum | 1 | 3 | zookeeper-3.2.2; debian | We were running 3 zookeeper servers, and simulated a failure on one of the servers. The one zookeeper node follows the other, but has trouble connecting. It looks like the following exception is the cause: {noformat} 2011-03-01T14:02:29+02:00 e0-cb-4e-65-4d-60 INFO [zookeeper] -- [org.apache.zookeeper.server.quorum.QuorumPeer] FOLLOWING 2011-03-01T14:02:29+02:00 e0-cb-4e-65-4d-60 INFO [zookeeper] -- [org.apache.zookeeper.server.ZooKeeperServer] Created server 2011-03-01T14:02:29+02:00 e0-cb-4e-65-4d-60 INFO [zookeeper] -- [org.apache.zookeeper.server.quorum.Follower] Following zookeeper3/192.168.131.11:2888 2011-03-01T14:02:29+02:00 e0-cb-4e-65-4d-60 WARNING [zookeeper] -- [org.apache.zookeeper.server.quorum.Follower] Unexpected exception, tries=0 2011-03-01T14:02:29+02:00 e0-cb-4e-65-4d-60 WARNING java.net.ConnectException: -- Connection refused 2011-03-01T14:02:29+02:00 e0-cb-4e-65-4d-60 WARNING -- at java.net.PlainSocketImpl.socketConnect(Native Method) 2011-03-01T14:02:29+02:00 e0-cb-4e-65-4d-60 WARNING -- at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:310) 2011-03-01T14:02:29+02:00 e0-cb-4e-65-4d-60 WARNING -- at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:176) 2011-03-01T14:02:29+02:00 e0-cb-4e-65-4d-60 WARNING -- at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:163) 2011-03-01T14:02:29+02:00 e0-cb-4e-65-4d-60 WARNING -- at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:384) 2011-03-01T14:02:29+02:00 e0-cb-4e-65-4d-60 WARNING -- at java.net.Socket.connect(Socket.java:546) 2011-03-01T14:02:29+02:00 e0-cb-4e-65-4d-60 WARNING -- at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:156) 2011-03-01T14:02:29+02:00 e0-cb-4e-65-4d-60 WARNING -- at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:549) {noformat} The last exception while connecting was: {noformat} 2011-03-01T14:02:33+02:00 e0-cb-4e-65-4d-60 ERR [zookeeper] -- [org.apache.zookeeper.server.quorum.Follower] Unexpected exception 2011-03-01T14:02:33+02:00 e0-cb-4e-65-4d-60 ERR java.net.ConnectException: -- Connection refused 2011-03-01T14:02:33+02:00 e0-cb-4e-65-4d-60 ERR -- at java.net.PlainSocketImpl.socketConnect(Native Method) 2011-03-01T14:02:33+02:00 e0-cb-4e-65-4d-60 ERR -- at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:310) 2011-03-01T14:02:33+02:00 e0-cb-4e-65-4d-60 ERR -- at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:176) 2011-03-01T14:02:33+02:00 e0-cb-4e-65-4d-60 ERR -- at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:163) 2011-03-01T14:02:33+02:00 e0-cb-4e-65-4d-60 ERR -- at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:384) 2011-03-01T14:02:33+02:00 e0-cb-4e-65-4d-60 ERR -- at java.net.Socket.connect(Socket.java:546) 2011-03-01T14:02:33+02:00 e0-cb-4e-65-4d-60 ERR -- at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:156) 2011-03-01T14:02:33+02:00 e0-cb-4e-65-4d-60 ERR -- at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:549) 2011-03-01T14:02:33+02:00 e0-cb-4e-65-4d-60 WARNING [zookeeper] -- [org.apache.zookeeper.server.quorum.Follower] Exception when following the leader {noformat} The leader started leading a bit later {noformat} 2011-03-01T14:02:29+02:00 e0-cb-4e-65-4d-7d INFO [zookeeper] -- [org.apache.zookeeper.server.quorum.FastLeaderElection] Notification: 0, 94489312534, 25, 2, LOOKING, LOOKING, 0 2011-03-01T14:02:29+02:00 e0-cb-4e-65-4d-7d INFO [zookeeper] -- [org.apache.zookeeper.server.quorum.FastLeaderElection] Adding vote 2011-03-01T14:02:32+02:00 e0-cb-4e-65-4d-7d WARNING [zookeeper] -- [org.apache.zookeeper.server.quorum.QuorumCnxManager] Cannot open channel to 1 at election address zookeeper2/192.168.132.10:3888 2011-03-01T14:02:32+02:00 e0-cb-4e-65-4d-7d WARNING -- at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:323) 2011-03-01T14:02:50+02:00 e0-cb-4e-65-4d-7d INFO [zookeeper] -- [org.apache.zookeeper.server.quorum.QuorumPeer] LEADING {noformat} But at that time the follower had already terminated and started a new election, so the leader failed: {noformat} 2011-03-01T14:02:50+02:00 e0-cb-4e-65-4d-7d INFO [zookeeper] -- [org.apache.zookeeper.server.ZooKeeperServer] Created server 2011-03-01T14:02:50+02:00 e0-cb-4e-65-4d-7d INFO [zookeeper] -- [org.apache.zookeeper.server.persistence.FileSnap] Reading snapshot /var/lib/zookeeper/version-2/snapshot.1600007d16 2011-03-01T14:02:50+02:00 e0-cb-4e-65-4d-7d INFO [zookeeper] -- [org.apache.zookeeper.server.persistence.FileTxnSnapLog] Snapshotting: 1600007d16 2011-03-01T14:02:53+02:00 e0-cb-4e-65-4d-7d WARNING [zookeeper] -- [org.apache.zookeeper.server.quorum.QuorumCnxManager] Cannot open channel to 1 at election address zookeeper2/192.168.132.10:3888 2011-03-01T14:02:53+02:00 e0-cb-4e-65-4d-7d WARNING -- at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:323) 2011-03-01T14:02:53+02:00 e0-cb-4e-65-4d-7d WARNING -- at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:302) 2011-03-01T14:02:53+02:00 e0-cb-4e-65-4d-7d WARNING -- at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:323) 2011-03-01T14:02:53+02:00 e0-cb-4e-65-4d-7d WARNING -- at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:296) 2011-03-01T14:02:53+02:00 e0-cb-4e-65-4d-7d INFO [zookeeper] -- [org.apache.zookeeper.server.quorum.FastLeaderElection] Sending new notification. 2011-03-01T14:03:11+02:00 e0-cb-4e-65-4d-7d INFO [zookeeper] -- [org.apache.zookeeper.server.quorum.FastLeaderElection] Sending new notification. 2011-03-01T14:03:14+02:00 e0-cb-4e-65-4d-7d WARNING [zookeeper] -- [org.apache.zookeeper.server.quorum.QuorumCnxManager] Cannot open channel to 1 at election address zookeeper2/192.168.132.10:3888 2011-03-01T14:03:14+02:00 e0-cb-4e-65-4d-7d WARNING -- at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:323) 2011-03-01T14:03:14+02:00 e0-cb-4e-65-4d-7d WARNING -- at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:302) 2011-03-01T14:03:14+02:00 e0-cb-4e-65-4d-7d WARNING -- at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:323) 2011-03-01T14:03:14+02:00 e0-cb-4e-65-4d-7d WARNING -- at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:296) 2011-03-01T14:03:14+02:00 e0-cb-4e-65-4d-7d INFO [zookeeper] -- [org.apache.zookeeper.server.quorum.FastLeaderElection] Sending new notification. 2011-03-01T14:03:32+02:00 e0-cb-4e-65-4d-7d INFO [zookeeper] -- [org.apache.zookeeper.server.quorum.FastLeaderElection] Sending new notification. 2011-03-01T14:03:34+02:00 e0-cb-4e-65-4d-7d INFO [zookeeper] -- [org.apache.zookeeper.server.quorum.Leader] Shutdown called 2011-03-01T14:03:34+02:00 e0-cb-4e-65-4d-7d INFO -- at org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:371) 2011-03-01T14:03:34+02:00 e0-cb-4e-65-4d-7d INFO -- at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:297) 2011-03-01T14:03:34+02:00 e0-cb-4e-65-4d-7d INFO -- at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:562) {noformat} From http://zookeeper.apache.org/doc/r3.2.2/zookeeperStarted.html: {quote} The new entry, initLimit is timeouts ZooKeeper uses to limit the length of time the ZooKeeper servers in quorum have to connect to a leader {quote} Since we have initLimit=10 and tickTime=4000, we should have 40 seconds for a zookeeper server to contact the leader. However, in the source code src/java/main/org/apache/zookeeper/server/quorum/Follower.java: {noformat} 152 for (int tries = 0; tries < 5; tries++) { 153 try { 154 //sock = new Socket(); 155 //sock.setSoTimeout(self.tickTime * self.initLimit); 156 sock.connect(addr, self.tickTime * self.syncLimit); 157 sock.setTcpNoDelay(nodelay); 158 break; 159 } catch (IOException e) { 160 if (tries == 4) { 161 LOG.error("Unexpected exception",e); 162 throw e; 163 } else { 164 LOG.warn("Unexpected exception, tries="+tries,e); 165 sock = new Socket(); 166 sock.setSoTimeout(self.tickTime * self.initLimit); 167 } 168 } 169 Thread.sleep(1000); 170 } {noformat} It appears as if we only have 4 seconds to contact the leader. The timeouts are applied to the socket, but do not take into account that the zookeeper leader may not have started its zookeeper service yet. Is this the expected behaviour? Or is the expected behaviour that followers should always be able to connect to the leader? |
2467 | No Perforce job exists for this issue. | 0 | 32775 | 8 years, 41 weeks, 1 day ago | 0|i05z1b: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1004 | TestClient.cc:363: Assertion: equality assertion failed |
Bug | Open | Major | Unresolved | Unassigned | Eugene Joseph Koontz | Eugene Joseph Koontz | 28/Feb/11 19:14 | 05/Dec/11 17:57 | 0 | 0 | Jenkins (Hudson) shows an error when running test-cppunit. I am not able to replicate this error on my own build machine, so I am unable to diagnose. Perhaps someone with access to the Apache Jenkins. Please see attached output from https://hudson.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/163//console (click on "full" to see the attached output if your browser can handle that much text). |
2468 | No Perforce job exists for this issue. | 0 | 32776 | 8 years, 16 weeks, 3 days ago | 0|i05z1j: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1003 | provide a separate client library jar |
Wish | Resolved | Major | Duplicate | Unassigned | Jean-Pierre Koenig | Jean-Pierre Koenig | 24/Feb/11 02:57 | 01/Nov/11 06:07 | 01/Nov/11 06:07 | 0 | 3 | ZOOKEEPER-233 | This feature request applies to ZooKeeper, HBase, Hadoop and maybe other projects. Currently, to use one of these projects, I need to include one big jar file as a dependency, that - contains the complete server code, - contains much more code then I use - and most annoyingly depends on many other jars, that are mostly needed for the server but not for the client library. Thus when using maven and including any of the mentioned projects, the dependency graph of my projects grows unnecessarily large. This is a severe problem for at least two reasons: - The probability of conflicting dependencies (versions) gets boosted. - Especially for mapreduce jobs depending on HBase or Zookeeper, the jars sent to the clusters grow to beyond 20-30MB of unnecessary dependencies. One could work around the problem with maven dependency exclusions, but this may lead to unpredictable runtime errors (ClassNotFound) since dependency management is not save on compile time only. I wish we could solve the underlying issue at the root with a client library. |
client, dependencies, library, maven | 2469 | No Perforce job exists for this issue. | 0 | 33348 | 8 years, 21 weeks, 2 days ago | 0|i062kn: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1002 | The Barrier sample code should create a EPHEMERAL znode instead of EPHEMERAL_SEQUENTIAL znode |
Bug | Resolved | Minor | Invalid | Ching-Shen Chen | Ching-Shen Chen | Ching-Shen Chen | 22/Feb/11 21:02 | 23/Apr/14 18:26 | 23/Apr/14 18:26 | 3.3.2 | 3.4.7, 3.5.0 | documentation | 0 | 4 | Please see the Barrier sample code from ZooKeeper Tutorial(http://zookeeper.apache.org/doc/r3.3.1/zookeeperTutorial.html#sc_barriers), that should enable a group of processes to synchronize the beginning and the end of a computation. | documentation | 2470 | No Perforce job exists for this issue. | 1 | 32777 | 5 years, 48 weeks, 1 day ago | 0|i05z1r: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-1000 | Provide SSL in zookeeper to be able to run cross colos. |
Improvement | Resolved | Major | Duplicate | Mahadev Konar | Mahadev Konar | Mahadev Konar | 21/Feb/11 21:26 | 11/Sep/19 16:33 | 21/May/19 22:20 | 26 | 51 | ZOOKEEPER-236 | This jira is to track SSL for zookeeper. The inter zookeeper server communication and the client to server communication should be over ssl so that zookeeper can be deployed over WAN's. | 2471 | No Perforce job exists for this issue. | 0 | 42079 | 43 weeks, 1 day ago | 0|i07kfz: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-999 | Create an package integration project |
New Feature | Closed | Major | Fixed | Eric Yang | Eric Yang | Eric Yang | 21/Feb/11 20:36 | 23/Nov/11 14:22 | 29/Aug/11 17:52 | 3.4.0 | build | 0 | 2 | ZOOKEEPER-1064, ZOOKEEPER-1190, HADOOP-6255 | Java 6, RHEL/Ubuntu | This goal of this ticket is to generate a set of RPM/debian package which integrate well with RPM sets created by HADOOP-6255. | 47517 | No Perforce job exists for this issue. | 14 | 33349 | 8 years, 30 weeks, 2 days ago | Create zookeeper rpm and deb packages. |
Reviewed
|
0|i062kv: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-997 | ZkClient ignores command if there are any space in front of it |
Improvement | Closed | Trivial | Duplicate | Laxman | Alex | Alex | 21/Feb/11 14:05 | 23/Nov/11 14:22 | 12/Oct/11 00:43 | 3.3.2 | 3.4.0 | java client | 0 | 3 | ZOOKEEPER-1025 | CentOS release 5.5 (Final) | ZkClient ignores command if there are any space in front of it. For example: ls / causes following output (note space in front of ls) ZooKeeper -server host:port cmd args connect host:port get path [watch] ls path [watch] ... |
2472 | No Perforce job exists for this issue. | 0 | 33350 | 8 years, 26 weeks, 1 day ago | 0|i062l3: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-996 | ZkClient: stat on non-existing node causes NPE |
Bug | Resolved | Trivial | Duplicate | Unassigned | Alex | Alex | 21/Feb/11 14:02 | 27/May/11 12:08 | 27/May/11 12:08 | 3.3.2 | java client | 0 | 0 | CentOS release 5.5 (Final) | stat on non-existing node causes NPE. client quit stat /aa Exception in thread "main" java.lang.NullPointerException at org.apache.zookeeper.ZooKeeperMain.printStat(ZooKeeperMain.java:130) at org.apache.zookeeper.ZooKeeperMain.processZKCmd(ZooKeeperMain.java:722) at org.apache.zookeeper.ZooKeeperMain.processCmd(ZooKeeperMain.java:581) at org.apache.zookeeper.ZooKeeperMain.executeLine(ZooKeeperMain.java:353) at org.apache.zookeeper.ZooKeeperMain.run(ZooKeeperMain.java:311) at org.apache.zookeeper.ZooKeeperMain.main(ZooKeeperMain.java:270) |
214213 | No Perforce job exists for this issue. | 0 | 32778 | 8 years, 43 weeks, 6 days ago | 0|i05z1z: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-995 | C Client exposing chroot information |
Bug | Resolved | Major | Duplicate | Unassigned | Andrei Savu | Andrei Savu | 21/Feb/11 08:22 | 24/Apr/14 20:33 | 24/Apr/14 20:33 | c client | 0 | 1 | ZOOKEEPER-1027 | $ uname -a Linux kaizen 2.6.35-25-generic #44-Ubuntu SMP Fri Jan 21 17:40:48 UTC 2011 i686 GNU/Linux $ java -version java version "1.6.0_22" Java(TM) SE Runtime Environment (build 1.6.0_22-b04) Java HotSpot(TM) Server VM (build 17.1-b03, mixed mode) $ python -c "import zookeeper;print zookeeper.__version__" 3.4.0 (latest zookeeper from the trunk) |
When creating a new node while using a chrooted connection the client function returns the full path (no chroot prefix). I've encountered this while using zkpython and that's why I suppose it's a problem related to the C bindings. It seems like the java client it's not affected by the same issue (only tested using the command line interface). I will also attach a patch with failing test. | 2473 | No Perforce job exists for this issue. | 1 | 32779 | 5 years, 48 weeks ago | 0|i05z27: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-994 | "eclipse" target in the build script doesnot include libraray required for test classes in the classpath |
Bug | Closed | Minor | Fixed | MIS | MIS | MIS | 17/Feb/11 14:38 | 23/Nov/11 14:22 | 27/Feb/11 02:11 | 3.3.2 | 3.4.0 | build | 0 | 1 | 1800 | 1800 | 0% | Linux box, Eclipse IDE | The "eclipse" target in the zoo-keeper build script doesn't include the accessive.jar present in the folder /src/java/libtest in the .classpath file. But the accessive.jar is being referenced from a couple of test classes. However, the build is successful :) |
0% | 0% | 1800 | 1800 | 47518 | No Perforce job exists for this issue. | 1 | 32780 | 9 years, 4 weeks, 2 days ago |
Reviewed
|
0|i05z2f: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-993 | Code improvements |
Improvement | Closed | Minor | Fixed | MIS | MIS | MIS | 17/Feb/11 13:31 | 23/Nov/11 14:22 | 16/Mar/11 11:59 | 3.3.2, 3.3.3 | 3.4.0 | leaderElection | 0 | 0 | 1800 | 1800 | 0% | Linux box, Eclipse IDE, | In the file org.apache.zookeeper.server.quorum.FastLeaderElection.java for methods like totalOrderPredicate and termPredicate, which return boolean, the code is as below : if (condition) return true; else return false; I feel, it would be better if the condition itself is returned. i.e., return condition. The same thing holds good else where if applicable. |
0% | 0% | 1800 | 1800 | 47519 | No Perforce job exists for this issue. | 1 | 33351 | 9 years, 2 weeks ago | 0|i062lb: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-992 | MT Native Version of Windows C Client |
New Feature | Closed | Major | Fixed | Dheeraj Agrawal | Camille Fournier | Camille Fournier | 17/Feb/11 11:29 | 23/Nov/11 14:22 | 18/Jul/11 20:59 | 3.4.0 | c client | 2 | 4 | Windows 32 | This is an extention of the work in https://issues.apache.org/jira/browse/ZOOKEEPER-859 |
47520 | No Perforce job exists for this issue. | 11 | 33352 | 8 years, 28 weeks, 1 day ago | 0|i062lj: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-991 | QuoromPeer.OBSERVER_ID |
Bug | Open | Major | Unresolved | Unassigned | Sandeep Maheshwari | Sandeep Maheshwari | 14/Feb/11 01:45 | 05/Feb/20 07:17 | 3.3.2 | 3.7.0, 3.5.8 | quorum | 0 | 0 | ZOOKEEPER-933 | Windows | I don't understand why do we even need this code at first place. if (remoteSid == QuorumPeer.OBSERVER_ID) { /* * Choose identifier at random. We need a value to identify * the connection. */ remoteSid = observerCounter--; initializeMessageQueue(remoteSid); LOG.info("Setting arbitrary identifier to observer: " + remoteSid); } Even if remove above code from public Long readRemoteServerID(Socket sock) {} function the FLE will work correctly. Because when any other peer(PARTICIPANT) receive a notification from the observer, that peer won't consider his(observer) vote because of this check if(!self.getVotingView().containsKey(response.sid)) Hence there is no need of that code. Also bcoz to above code there is a possibility of creating redundant threads (SendWorker-ReceiveWorker) bcoz when same participant try to initiate connection with same peer we are doing (sid = observerCounter--;). So the same observer getting different sid and hence corresponding thread would be crated which will be of no use. Please let me know if i am correct. |
2474 | No Perforce job exists for this issue. | 0 | 32781 | 9 years, 6 weeks, 2 days ago | 0|i05z2n: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-990 | random session timeout when there is a large number of sessions |
Bug | Open | Major | Unresolved | Unassigned | Xiaowei Jiang | Xiaowei Jiang | 13/Feb/11 19:20 | 14/Feb/11 12:04 | 3.3.2 | server | 0 | 1 | When there is large number of sessions, random session timeout starts after a few hours. It happens even though the load on the server is small (less than 1 out of 8 process busy and plenty of memory). Increase the timeout to 300 seconds only delays this but the session timeout eventually happens. | 2475 | No Perforce job exists for this issue. | 0 | 32782 | 9 years, 6 weeks, 3 days ago | 0|i05z2v: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-989 | ZK servers not balanced in number of sessions |
Bug | Open | Minor | Unresolved | Unassigned | Xiaowei Jiang | Xiaowei Jiang | 13/Feb/11 19:16 | 19/Mar/11 16:15 | 3.3.2 | c client | 0 | 1 | ZOOKEEPER-1018 | In a 5-machine ZK cluster, when there is a large number of sessions, the 1st server seems to get more sessions. 1st server gets 25% sessions, while the remaining gets 18.75% sessions |
2476 | No Perforce job exists for this issue. | 0 | 32783 | 9 years, 1 week, 5 days ago | 0|i05z33: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-988 | ZK server hang on leader election |
Bug | Resolved | Major | Incomplete | Unassigned | Xiaowei Jiang | Xiaowei Jiang | 13/Feb/11 19:13 | 14/Oct/13 19:52 | 14/Oct/13 19:52 | 3.3.2 | leaderElection | 0 | 2 | org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run thread exited unexpected, so the server hang on leader election. QuorumPeer:/0.0.0.0:2181: [1] sun.misc.Unsafe.park (native method) [2] java.util.concurrent.locks.LockSupport.parkNanos (LockSupport.java:198) [3] java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos (AbstractQueuedSynchronizer.java:1,963) [4] java.util.concurrent.LinkedBlockingQueue.poll (LinkedBlockingQueue.java:395) [5] org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader (FastLeaderElection.java:677) [6] org.apache.zookeeper.server.quorum.QuorumPeer.run (QuorumPeer.java:621) |
2477 | No Perforce job exists for this issue. | 0 | 32784 | 9 years, 6 weeks, 3 days ago | 0|i05z3b: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-987 | Fatal error after reelection |
Bug | Resolved | Major | Not A Problem | Unassigned | Xiaowei Jiang | Xiaowei Jiang | 13/Feb/11 19:10 | 14/Feb/11 01:14 | 14/Feb/11 01:14 | 3.3.2 | server | 0 | 0 | ZK server hit fatal error after leader re-election: 2011-01-17 14:38:29,709 - DEBUG [WorkerSender Thread:QuorumCnxManager@384] - There is a connection already for server 4 2011-01-17 14:38:30,111 - DEBUG [WorkerReceiver Thread:FastLeaderElection$Messenger$WorkerReceiver@214] - Receive new notification message. My id = 1 2011-01-17 14:38:30,111 - INFO [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 4 (n.leader), 8589936845 (n.zxid), 6 (n.round), LOOKING (n.state), 4 (n.sid), FOLLOWING (my state) 2011-01-17 14:38:30,111 - DEBUG [WorkerReceiver Thread:FastLeaderElection$Messenger$WorkerReceiver@288] - Sending new notification. My id = 1, Recipient = 4 2011-01-17 14:38:30,112 - DEBUG [WorkerSender Thread:QuorumCnxManager@384] - There is a connection already for server 4 2011-01-17 14:38:34,115 - INFO [QuorumPeer:/0.0.0.0:2181:Learner@315] - Setting leader epoch 3 2011-01-17 14:38:34,117 - WARN [QuorumPeer:/0.0.0.0:2181:Follower@116] - Got zxid 0x2000008ce expected 0x1 2011-01-17 14:38:34,117 - INFO [QuorumPeer:/0.0.0.0:2181:FileTxnSnapLog@208] - Snapshotting: 300000000 2011-01-17 14:38:37,346 - WARN [QuorumPeer:/0.0.0.0:2181:Follower@116] - Got zxid 0x300000001 expected 0x2000008cf 2011-01-17 14:38:37,988 - FATAL [QuorumPeer:/0.0.0.0:2181:FollowerZooKeeperServer@112] - Committing zxid 0x300000001 but next pending txn 0x2000008ce |
214212 | No Perforce job exists for this issue. | 0 | 32785 | 9 years, 6 weeks, 3 days ago | 0|i05z3j: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-986 | In QuoromCnxManager we are adding sent messgae to lastMessageSent, but we are never removing that message from it after sending it, so this will lead to sending the same message again in next round |
Bug | Resolved | Minor | Not A Problem | Unassigned | Sandeep Maheshwari | Sandeep Maheshwari | 11/Feb/11 07:04 | 19/May/14 17:42 | 19/May/14 17:42 | 3.3.2 | 3.5.0 | quorum | 0 | 1 | Windows | Function for sending out the notification message to corresponding peer for leader election private void processMessages() throws Exception { try { ByteBuffer b = getLastMessageSent(sid); if (b != null) { send(b); } } catch (IOException e) { LOG.error("Failed to send last message to " + sid, e); throw e; } try { ArrayBlockingQueue<ByteBuffer> bq = queueSendMap.get(sid); if (bq == null) { dumpQueueSendMap(); throw new Exception("No queue for incoming messages for " + "sid=" + sid); } while (running && !shutdown && sock != null) { ByteBuffer b = null; try { b = bq.poll(1000, TimeUnit.MILLISECONDS); if(b != null){ recordLastMessageSent(sid, b); send(b); } } catch (InterruptedException e) { LOG.warn("Interrupted while waiting for message on " + "queue", e); } } } catch (Exception e) { LOG.warn("Exception when using channel: for id " + sid + " my id = " + self.getId() + " error = ", e); throw e; } } This is the code taken from zookeeper patch 932. Here we are adding the message to be sent in current round to lastMessageSent. But in next round that message will still be there. So when we try to send a new message to server it will again do ByteBuffer b = getLastMessageSent(sid); if (b != null) { send(b); } and it will again send back that old message to that server. So in this way it will send back every message twice. Though it will not affect the correctness of FLE but sending message twice it create an extra overhead and slow down the election process. |
gsoc | 36636 | No Perforce job exists for this issue. | 0 | 32786 | 5 years, 44 weeks, 3 days ago | 0|i05z3r: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-985 | Test BookieRecoveryTest fails on trunk. |
Bug | Closed | Major | Fixed | Flavio Paiva Junqueira | Mahadev Konar | Mahadev Konar | 09/Feb/11 14:24 | 23/Nov/11 14:22 | 18/Feb/11 12:55 | 3.3.3, 3.4.0 | contrib-bookkeeper | 0 | 1 | Darwin moststock-lm 9.7.0 Darwin Kernel Version 9.7.0: Tue Mar 31 22:52:17 PDT 2009; root:xnu-1228.12.14~1/RELEASE_I386 i386 (mac). | The unit test fails on trunk on my mac. I think this might be the same on other platforms as well. Ill attach the error logs. | 47521 | No Perforce job exists for this issue. | 3 | 32787 | 9 years, 5 weeks, 5 days ago |
Reviewed
|
0|i05z3z: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-984 | jenkins failure in testSessionMoved - NPE in quorum |
Bug | Resolved | Blocker | Cannot Reproduce | Unassigned | Patrick D. Hunt | Patrick D. Hunt | 07/Feb/11 13:34 | 28/Feb/19 14:47 | 24/Dec/13 05:39 | 3.3.2 | 3.5.0 | 0 | 6 | Got the following NPE on my internal jenkins setup running against released 3.3.2 (see attached log) {noformat} [junit] 2011-02-06 10:39:56,988 - WARN [QuorumPeer:/0.0.0.0:11365:Follower@116] - Got zxid 0x100000001 expected 0x1 [junit] 2011-02-06 10:39:56,988 - INFO [SyncThread:3:FileTxnLog@197] - Creating new log file: log.100000001 [junit] 2011-02-06 10:39:56,989 - WARN [QuorumPeer:/0.0.0.0:11364:Follower@116] - Got zxid 0x100000001 expected 0x1 [junit] 2011-02-06 10:39:56,989 - INFO [SyncThread:2:FileTxnLog@197] - Creating new log file: log.100000001 [junit] 2011-02-06 10:39:56,990 - WARN [QuorumPeer:/0.0.0.0:11363:Follower@116] - Got zxid 0x100000001 expected 0x1 [junit] 2011-02-06 10:39:56,990 - INFO [SyncThread:5:FileTxnLog@197] - Creating new log file: log.100000001 [junit] 2011-02-06 10:39:56,990 - WARN [QuorumPeer:/0.0.0.0:11366:Follower@116] - Got zxid 0x100000001 expected 0x1 [junit] 2011-02-06 10:39:56,990 - INFO [SyncThread:1:FileTxnLog@197] - Creating new log file: log.100000001 [junit] 2011-02-06 10:39:56,991 - INFO [SyncThread:4:FileTxnLog@197] - Creating new log file: log.100000001 [junit] 2011-02-06 10:39:56,995 - INFO [main-SendThread(localhost.localdomain:11363):ClientCnxn$SendThread@738] - Session establishment complete on server localhost.localdomain/127.0.0.1:11363, sessionid = 0x12dfc45e6dd0000, negotiated timeout = 30000 [junit] 2011-02-06 10:39:56,996 - INFO [CommitProcessor:1:NIOServerCnxn@1580] - Established session 0x12dfc45e6dd0000 with negotiated timeout 30000 for client /127.0.0.1:37810 [junit] 2011-02-06 10:39:56,999 - INFO [main:ZooKeeper@436] - Initiating client connection, connectString=127.0.0.1:11364 sessionTimeout=30000 watcher=org.apache.zookeeper.test.QuorumTest$5@248523a0 sessionId=85001345146093568 sessionPasswd=<hidden> [junit] 2011-02-06 10:39:57,000 - INFO [main-SendThread():ClientCnxn$SendThread@1041] - Opening socket connection to server /127.0.0.1:11364 [junit] 2011-02-06 10:39:57,000 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11364:NIOServerCnxn$Factory@251] - Accepted socket connection from /127.0.0.1:36682 [junit] 2011-02-06 10:39:57,001 - INFO [main-SendThread(localhost.localdomain:11364):ClientCnxn$SendThread@949] - Socket connection established to localhost.localdomain/127.0.0.1:11364, initiating session [junit] 2011-02-06 10:39:57,002 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11364:NIOServerCnxn@770] - Client attempting to renew session 0x12dfc45e6dd0000 at /127.0.0.1:36682 [junit] 2011-02-06 10:39:57,002 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11364:Learner@95] - Revalidating client: 85001345146093568 [junit] 2011-02-06 10:39:57,003 - INFO [QuorumPeer:/0.0.0.0:11364:NIOServerCnxn@1580] - Established session 0x12dfc45e6dd0000 with negotiated timeout 30000 for client /127.0.0.1:36682 [junit] 2011-02-06 10:39:57,004 - INFO [main-SendThread(localhost.localdomain:11364):ClientCnxn$SendThread@738] - Session establishment complete on server localhost.localdomain/127.0.0.1:11364, sessionid = 0x12dfc45e6dd0000, negotiated timeout = 30000 [junit] 2011-02-06 10:39:57,005 - WARN [CommitProcessor:2:NIOServerCnxn@1524] - Unexpected exception. Destruction averted. [junit] java.lang.NullPointerException [junit] at org.apache.jute.BinaryOutputArchive.writeRecord(BinaryOutputArchive.java:123) [junit] at org.apache.zookeeper.proto.SetDataResponse.serialize(SetDataResponse.java:40) [junit] at org.apache.jute.BinaryOutputArchive.writeRecord(BinaryOutputArchive.java:123) [junit] at org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1500) [junit] at org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:367) [junit] at org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:73) [junit] Running org.apache.zookeeper.test.QuorumTest [junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0 sec [junit] Test org.apache.zookeeper.test.QuorumTest FAILED (timeout) [junit] 2011-02-06 10:53:26,189 - INFO [main:PortAssignment@31] - assigning port 11221 [junit] 2011-02-06 10:53:26,192 - INFO [main:PortAssignment@31] - assigning port 11222 {noformat} |
36637 | No Perforce job exists for this issue. | 1 | 32788 | 1 year, 3 weeks ago | 0|i05z47: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-983 | running zkServer.sh start remotely using ssh hangs |
Bug | Closed | Minor | Fixed | Patrick D. Hunt | Patrick D. Hunt | Patrick D. Hunt | 03/Feb/11 00:21 | 23/Nov/11 14:22 | 27/Feb/11 01:57 | 3.3.2 | 3.4.0 | scripts | 0 | 2 | If zkServer.sh is run remotely using ssh as follows ssh will "hang" - i.e. not complete/return once the server is started. This is even though zkServer.sh starts the java vm in the background. $ ssh <host> "zkServer.sh start" this is due to the following issue: http://www.slac.stanford.edu/comp/unix/ssh_faq.html#logoff_hangs |
37456 | No Perforce job exists for this issue. | 1 | 30000 | 8 years, 42 weeks ago |
Reviewed
|
0|i05hx3: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-982 | zkServer.sh won't start zookeeper on an ubuntu 10.10 system due to a bug in the startup script. |
Bug | Resolved | Minor | Invalid | Thomas Koch | Bjørn Remseth | Bjørn Remseth | 02/Feb/11 04:52 | 12/Dec/11 12:47 | 12/Dec/11 12:47 | 3.3.1 | 3.5.0 | scripts | 0 | 2 | When running "zkServer.sh start" I get these error messages: ==== $sudo sh bin/zkServer.sh start MX enabled by default bin/zkServer.sh: 69: cygpath: not found Using config: grep: : No such file or directory Starting zookeeper ... STARTED $ Invalid config, exiting abnormally ==== The "Invalid config..." text is output from the server which terminates immediately after this message has been printed. The fix is easy: Inside zkServer.sh change the line ==== if $cygwin ==== into ==== if [ -n "$cygwin" ] ==== This fixes the problem and makes the server run |
70801 | No Perforce job exists for this issue. | 1 | 32789 | 8 years, 15 weeks, 5 days ago | 0|i05z4f: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-981 | Hang in zookeeper_close() in the multi-threaded C client |
Bug | Closed | Critical | Fixed | Jeremy Stribling | Jeremy Stribling | Jeremy Stribling | 01/Feb/11 15:23 | 23/Nov/11 14:22 | 14/Sep/11 00:10 | 3.3.2 | 3.4.0 | c client | 1 | 7 | Debian Squeeze, Linux 2.6.32-5, x86_64 | I saw a hang once when my C++ application called the zookeeper_close() method of the multi-threaded Zookeeper client library. The stack trace of the hung thread was the following: {quote} Thread 8 (Thread 5644): #0 0x00007f5d7bb5bbe4 in __lll_lock_wait () from /lib/libpthread.so.0 #1 0x00007f5d7bb59ad0 in pthread_cond_broadcast@@GLIBC_2.3.2 () from /lib/libpthread.so.0 #2 0x00007f5d793628f6 in unlock_completion_list (l=0x32b4d68) at .../zookeeper/src/c/src/mt_adaptor.c:66 #3 0x00007f5d79354d4b in free_completions (zh=0x32b4c80, callCompletion=1, reason=-116) at .../zookeeper/src/c/src/zookeeper.c:1069 #4 0x00007f5d79355008 in cleanup_bufs (zh=0x32b4c80, callCompletion=1, rc=-116) at .../thirdparty/zookeeper/src/c/src/zookeeper.c:1125 #5 0x00007f5d79353200 in destroy (zh=0x32b4c80) at .../thirdparty/zookeeper/src/c/src/zookeeper.c:366 #6 0x00007f5d79358e0e in zookeeper_close (zh=0x32b4c80) at .../zookeeper/src/c/src/zookeeper.c:2326 #7 0x00007f5d79356d18 in api_epilog (zh=0x32b4c80, rc=0) at .../zookeeper/src/c/src/zookeeper.c:1661 #8 0x00007f5d79362f2f in adaptor_finish (zh=0x32b4c80) at .../zookeeper/src/c/src/mt_adaptor.c:205 #9 0x00007f5d79358c8c in zookeeper_close (zh=0x32b4c80) at .../zookeeper/src/c/src/zookeeper.c:2297 ... {quote} The omitted part of the stack trace is entirely within my application, and contains no other calls to/from the Zookeeper client. In particular, I am not calling zookeeper_close() from within a completion handler or any of the library's threads. I haven't been able to reproduce this, and when I encountered this I wasn't capturing logging from the client library, so unfortunately I don't have any more information at this time. But I will update this JIRA if I see it again. |
47522 | No Perforce job exists for this issue. | 3 | 32790 | 8 years, 18 weeks, 6 days ago |
Reviewed
|
0|i05z4n: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-980 | allow configuration parameters for log4j.properties |
Improvement | Closed | Minor | Fixed | Patrick D. Hunt | Patrick D. Hunt | Patrick D. Hunt | 01/Feb/11 03:12 | 26/Apr/15 14:30 | 09/Feb/11 18:43 | 3.4.0 | 0 | 1 | ZOOKEEPER-2170 | log4j.properties can contain properties that may be overridden using system properties. Hadoop's bin/hadoop is doing this already, I will be replicating in ZK's config. | 37457 | No Perforce job exists for this issue. | 1 | 30004 | 9 years, 7 weeks ago |
Reviewed
|
0|i05hxz: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-979 | UnknownHostException in QuorumCnxManager |
Bug | Open | Minor | Unresolved | Unassigned | Hugh Warrington | Hugh Warrington | 27/Jan/11 11:44 | 28/Jan/11 09:10 | 3.3.2 | server | 0 | 3 | I'm using zk 3.3.2 and I'm seeing this in my logs around startup: 2011-01-27 10:16:21,513 [WorkerSender Thread] WARN org.apache.zookeeper.server.quorum.QuorumCnxManager - Cannot open channel to 0 at election address xxx.yyy.com/10.2.131.19:3888 java.net.UnknownHostException at sun.nio.ch.Net.translateException(Net.java:100) at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:140) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:366) at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:335) at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:360) at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:333) at java.lang.Thread.run(Thread.java:636) And all subsequent zk ops give {{ConnectionLossException}}. I've just explained this to breed_zk on IRC, and he asked me to file a ticket, mentioning that UnknownHostException may sometimes be thrown for reasons other than host resolution. While I'm reasonably certain that the hostname is correct and should be contactable, I need to put some more time into checking our network setup to be absolutely sure. However, two observations arose while looking into this: * At the top of QuorumCnxManager.connectOne(), we set electionAddr (or fail and return). But then a few lines later we don't actually use this local variable in the call to connect(). This seems like a minor programming mistake (although AFAICT it doesn't change the behaviour). * In the subsequent catch block, the UnknownHostException that's thrown doesn't contain the address that we were trying to connect to (though if you capture WARN log messages, you can see what it was). |
36638 | No Perforce job exists for this issue. | 0 | 32791 | 9 years, 8 weeks, 6 days ago | 0|i05z4v: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-978 | ZookeeperServer does not close zk database on shutdwon |
Bug | Resolved | Major | Duplicate | Thomas Koch | Sergei Bobovich | Sergei Bobovich | 20/Jan/11 12:04 | 17/May/14 22:33 | 17/May/14 22:33 | 3.3.2 | 3.4.6, 3.5.0 | server | 0 | 2 | ZookeeperServer does not close zk database on shutdown leaving log files open. Not sure if this is an intention, but looks like a possible bug to me. Database is getting closed only from QuorumPeer class. Hit it when executing regression tests on windows: failed to delete log files from cleanup. |
35 | No Perforce job exists for this issue. | 2 | 32792 | 5 years, 44 weeks, 4 days ago | 0|i05z53: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-977 | passing null for path_buffer in zoo_create |
Improvement | Closed | Major | Fixed | Benjamin Reed | Benjamin Reed | Benjamin Reed | 19/Jan/11 14:43 | 23/Nov/11 14:22 | 08/Feb/11 22:14 | 3.4.0 | 0 | 0 | it is unclear from the comments for zoo_create if a NULL can be passed for path_buffer. | 47523 | No Perforce job exists for this issue. | 1 | 33353 | 9 years, 7 weeks, 1 day ago |
Reviewed
|
0|i062lr: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-976 | ZooKeeper startup script doesn't use JAVA_HOME |
Bug | Closed | Minor | Fixed | Patrick D. Hunt | Patrick D. Hunt | Patrick D. Hunt | 17/Jan/11 20:36 | 23/Nov/11 14:22 | 27/Feb/11 02:02 | 3.3.2 | 3.4.0 | 0 | 2 | HADOOP-7092 | From bug filed on CDH: https://issues.cloudera.org/browse/DISTRO-47 - moving it to this jira to address: ------------------------------------------------------ Bug filed by "grep.alex" at http://getsatisfaction.com/cloudera/topics/cdh3b3_zookeeper_startup_script_doesnt_use_java_home On RedHat 5 (using the RPM installer) I was able to install and run all the Hadoop components. The Zookeeper install was fine, but it wouldn't start: {noformat} [root@aholmes-desktop init.d]# ./hadoop-zookeeper start JMX enabled by default Using config: /etc/zookeeper/zoo.cfg Starting zookeeper ... STARTED [root@aholmes-desktop init.d]# Exception in thread "main" java.lang.NoSuchMethodError: method java.lang.management.ManagementFactory.getPlatformMBeanServer with signature ()Ljavax.management.MBeanServer; was not found. at org.apache.zookeeper.jmx.ManagedUtil.registerLog4jMBeans(ManagedUtil.java:48 ... {noformat} After some digging around I found the cause - the Zookeeper startup script (/usr/lib/zookeeper/bin/zkServer.sh ) uses the java found in the path, whereas the other startup scripts use JAVA_HOME. In my case I had the default RHEL5 1.4 JDK in the path, and the 1.6 JDK RPM's installed under /usr/java, hence the above error, which I'm guessing is a fairly common setup. In my opinion all the startup scripts should all use the same mechanism to determine where to pick java. |
37455 | No Perforce job exists for this issue. | 2 | 32793 | 9 years, 4 weeks, 2 days ago |
Reviewed
|
0|i05z5b: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-975 | new peer goes in LEADING state even if ensemble is online |
Bug | Closed | Major | Fixed | Vishal Kher | Vishal Kher | Vishal Kher | 14/Jan/11 07:29 | 23/Nov/11 14:22 | 29/Apr/11 12:13 | 3.3.2 | 3.4.0 | 0 | 4 | Scenario: 1. 2 of the 3 ZK nodes are online 2. Third node is attempting to join 3. Third node unnecessarily goes in "LEADING" state 4. Then third goes back to LOOKING (no majority of followers) and finally goes to FOLLOWING state. While going through the logs I noticed that a peer C that is trying to join an already formed cluster goes in LEADING state. This is because QuorumCnxManager of A and B sends the entire history of notification messages to C. C receives the notification messages that were exchanged between A and B when they were forming the cluster. In FastLeaderElection.lookForLeader(), due to the following piece of code, C quits lookForLeader assuming that it is supposed to lead. 740 //If have received from all nodes, then terminate 741 if ((self.getVotingView().size() == recvset.size()) && 742 (self.getQuorumVerifier().getWeight(proposedLeader) != 0)){ 743 self.setPeerState((proposedLeader == self.getId()) ? 744 ServerState.LEADING: learningState()); 745 leaveInstance(); 746 return new Vote(proposedLeader, proposedZxid); 747 748 } else if (termPredicate(recvset, This can cause: 1. C to unnecessarily go in LEADING state and wait for tickTime * initLimit and then restart the FLE. 2. C waits for 200 ms (finalizeWait) and then considers whatever notifications it has received to make a decision. C could potentially decide to follow an old leader, fail to connect to the leader, and then restart FLE. See code below. 752 if (termPredicate(recvset, 753 new Vote(proposedLeader, proposedZxid, 754 logicalclock))) { 755 756 // Verify if there is any change in the proposed leader 757 while((n = recvqueue.poll(finalizeWait, 758 TimeUnit.MILLISECONDS)) != null){ 759 if(totalOrderPredicate(n.leader, n.zxid, 760 proposedLeader, proposedZxid)){ 761 recvqueue.put(n); 762 break; 763 } 764 } In general, this does not affect correctness of FLE since C will eventually go back to FOLLOWING state (A and B won't vote for C). However, this delays C from joining the cluster. This can in turn affect recovery time of an application. Proposal: A and B should send only the latest notification (most recent) instead of the entire history. Does this sound reasonable? |
47524 | No Perforce job exists for this issue. | 7 | 32794 | 8 years, 47 weeks, 5 days ago | 0|i05z5j: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-974 | Configurable listen socket backlog for the client port |
Improvement | Resolved | Minor | Fixed | Josh Elser | Hoonmin Kim | Hoonmin Kim | 10/Jan/11 03:35 | 04/Oct/19 10:55 | 13/Feb/19 07:13 | 3.3.2 | 3.6.0 | server | 0 | 4 | 0 | 10200 | We're running ZooKeeper ensemble(3-node configuration) for production use for months. Days ago, we suffered temporary network? problems that caused many reconnections(about 300) of ephemeral nodes in one ZooKeeper server. The almost all clients successfully reconnected to the other ZooKeeper servers, but one client failed to reconnect in time and got a session expired message from the server. (The problem is that our clients died when they got SessionExpired message.) There were many listenQ overflows/drops and out resets in a minute just before the problem situation. --- So we patched ZooKeeper to increase the backlog size for the client port socket to avoid unhappy cases like this. As ZooKeeper uses default backlog size(50) to bind(), we added "clientPortBacklog" option. Though the default backlog should be good for common environment, we believe that configuring the size is also meaningful. [Note] On linux, below parameter : net.core.somaxconn needs to be larger than above "clientPortBacklog" to correctly configure listen socket backlog |
100% | 100% | 10200 | 0 | pull-request-available | 36639 | No Perforce job exists for this issue. | 3 | 42081 | 1 year, 5 weeks, 1 day ago | backlog | 0|i07kgf: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-973 | bind() could fail on Leader because it does not setReuseAddress on its ServerSocket |
Bug | Resolved | Trivial | Fixed | Harsh J | Vishal Kher | Vishal Kher | 05/Jan/11 01:47 | 24/Jan/12 05:59 | 23/Jan/12 15:35 | 3.3.2 | 3.4.3, 3.3.5, 3.5.0 | server | 0 | 3 | setReuseAddress(true) should be used below. Leader(QuorumPeer self,LeaderZooKeeperServer zk) throws IOException { this.self = self; try { ss = new ServerSocket(self.getQuorumAddress().getPort()); } catch (BindException e) { LOG.error("Couldn't bind to port " + self.getQuorumAddress().getPort(), e); throw e; } this.zk=zk; } |
36640 | No Perforce job exists for this issue. | 2 | 32795 | 8 years, 9 weeks, 2 days ago |
Reviewed
|
0|i05z5r: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-972 | perl Net::ZooKeeper segfaults when setting a watcher on get_children |
Bug | Open | Major | Unresolved | Unassigned | Robert Powers | Robert Powers | 03/Jan/11 14:06 | 05/Feb/20 07:16 | 3.3.2 | 3.7.0, 3.5.8 | contrib-bindings | 0 | 0 | rhel 5.3, perl 5.10, Net::Zookeeper-1.35, zookeeper_c_client-3.3.2 and below. | The issue I'm seeing seems strikingly similar to this: https://issues.apache.org/jira/browse/ZOOKEEPER-772 I have one writer process which adds sequenced children nodes to /queue and a separate reader process which sets a children watcher on /queue, waiting for children to be added or deleted. Long story short, every time a child node is added or deleted by the writer, the reader's watcher is supposed to trigger so the reader can check if it's time to get to work or go back to bed. Bad things seem to happen while the reader is waiting on the watcher and the writer adds or deletes a node. In versions prior to 3.3.2, my code that sets a watcher on the children of a node using the perl binding would either lock up when trying to retrieve the children or would segfault when a child node was added while waiting on the watch. In 3.3.2, it seems to just do the locking up. I'm seeing this: assertion botched (free()ed/realloc()ed-away memory was overwritten?): !(MallocCfg[MallocCfg_filldead] && MallocCfg[Mall ocCfg_fillcheck]) || !cmp_pat_4bytes((unsigned char*)(p + 1), (((1 << ((bucket) >> 0)) + ((bucket >= 15 * 1) ? 4096 : 0)) - (siz eof(union overhead) + sizeof (unsigned int))) + sizeof (unsigned int), fill_deadbeef) (malloc.c:1536) I managed to get a stack trace Program received signal SIGABRT, Aborted. 0xffffe410 in __kernel_vsyscall () (gdb) where #0 0xffffe410 in __kernel_vsyscall () #1 0xf7b8ed80 in raise () from /lib/libc.so.6 #2 0xf7b90691 in abort () from /lib/libc.so.6 #3 0xf7d6d53f in botch (diag=0xa <Address 0xa out of bounds>, s=0xf7ef42e8 "!(MallocCfg[MallocCfg_filldead] && MallocCfg[MallocCfg_fillcheck]) || !cmp_pat_4bytes((unsigned char*)(p + 1), (((1 << ((bucket) >> 0)) + ((bucket >= 15 * 1) ? 4096 : 0)) - (sizeof(union overhead) + s"..., file=0xf7ef4119 "malloc.c", line =1536) at malloc.c:1327 #4 0xf7d6d97a in Perl_malloc (nbytes=15530) at malloc.c:1535 #5 0xf7d6f974 in Perl_calloc (elements=1, size=0) at malloc.c:2314 #6 0xf7929eca in _zk_create_watch (my_perl=0x0) at ZooKeeper.xs:204 #7 0xf7929f8f in _zk_acquire_watch (my_perl=0x0) at ZooKeeper.xs:240 #8 0xf793450b in XS_Net__ZooKeeper_watch (my_perl=0x889c008, cv=0x89db8b4) at ZooKeeper.xs:2035 #9 0xf7e1dd67 in Perl_pp_entersub (my_perl=0x889c008) at pp_hot.c:2847 #10 0xf7de47ce in Perl_runops_debug (my_perl=0x889c008) at dump.c:1931 #11 0xf7e0d856 in perl_run (my_perl=0x889c008) at perl.c:2384 #12 0x08048ace in main (argc=2, argv=0xffe11814, env=0xffe11820) at perlmain.c:113 The code to reproduce: sub bide_time { my $root = '/queue'; my $timeout = 20*1000; my $zkc = Net::ZooKeeper->new('localhost:2181'); while (1) { print "Retrieving $root\n"; my $child_watch = $zkc->watch('timeout' => $timeout); my @children = $zkc->get_children($root, watch=>$child_watch); if (scalar(@children)) { return @children if (rand(1) > 0.75); } else { print " - No Children.\n"; } print "Time to wait for the Children.\n"; if ($child_watch->wait()) { print "watch triggered on node $root:\n"; print " event: $child_watch->{event}\n"; print " state: $child_watch->{state}\n"; } else { print "watch timed out\n"; } } } |
36641 | No Perforce job exists for this issue. | 0 | 32796 | 9 years, 12 weeks, 3 days ago | 0|i05z5z: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-971 | Replace Packet class with Operation classes |
Improvement | Open | Minor | Unresolved | Thomas Koch | Thomas Koch | Thomas Koch | 30/Dec/10 13:20 | 30/Dec/10 13:20 | 0 | 1 | The operation classes introduced in ZOOKEEPER-911 can be used to replace the Packet class entirely. Then it would also be possible to move the code from the ugly big if clause in EventThread.processEvent to the individual operation classes. This cleanup may help to prepare the code for the move from jute to avro. |
214211 | No Perforce job exists for this issue. | 0 | 42082 | 9 years, 13 weeks ago | 0|i07kgn: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-970 | ZOOKEEPER-835 Review and refactor Java client close logic |
Sub-task | Open | Major | Unresolved | Thomas Koch | Thomas Koch | Thomas Koch | 30/Dec/10 12:14 | 30/Dec/10 13:29 | 0 | 1 | There have been several jira tickets to fix the close logic but there are still possibilities for blocks as discovered in ZOOKEEPER-911. For example the failing server.InvalidSnapshotTest times out because the ClientCnxn.close() call blocks in Packet.waitForFinish(). However the only change introduced is that instead of synchronize(packet) while(!packet.finished) packet.wait() I call packet.waitForFinish() which is a synchronized method. The bug is in ClientCnxn.queuePacket: ClientCnxn.closing is set to true before the closeSession Packet is added to outgoingQueue. Between these two steps, the SendThread already terminate so that there's nobody left to call packet.notifyAll(). |
214210 | No Perforce job exists for this issue. | 0 | 42083 | 9 years, 13 weeks ago | 0|i07kgv: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-969 | ZOOKEEPER-835 stat parameter in asynchronous getACL() method is superfluous |
Sub-task | Open | Major | Unresolved | Thomas Koch | Thomas Koch | Thomas Koch | 29/Dec/10 11:26 | 29/Dec/10 11:26 | 0 | 1 | 214209 | No Perforce job exists for this issue. | 0 | 42084 | 9 years, 13 weeks, 1 day ago | 0|i07kh3: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-968 | ZOOKEEPER-965 Database multi-update |
Sub-task | Closed | Major | Not A Problem | Unassigned | Ted Dunning | Ted Dunning | 29/Dec/10 02:06 | 23/Nov/11 14:22 | 16/Jul/11 15:12 | 3.4.0 | 0 | 1 | This includes the database operations themselves | 67883 | No Perforce job exists for this issue. | 0 | 33354 | 8 years, 36 weeks, 5 days ago | 0|i062lz: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-967 | ZOOKEEPER-965 Server side decoding and function dispatch |
Sub-task | Closed | Major | Fixed | Unassigned | Ted Dunning | Ted Dunning | 29/Dec/10 02:05 | 23/Nov/11 14:22 | 02/May/11 13:22 | 3.4.0 | 0 | 0 | This would include making the server catch the request and hand it down to the actual transaction code | 47525 | No Perforce job exists for this issue. | 0 | 33355 | 8 years, 47 weeks, 3 days ago | 0|i062m7: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-966 | ZOOKEEPER-965 Client side for multi |
Sub-task | Closed | Major | Fixed | Unassigned | Ted Dunning | Ted Dunning | 29/Dec/10 02:04 | 23/Nov/11 14:22 | 02/May/11 13:21 | 3.4.0 | 0 | 2 | This is jus the client side of the code up to and including the serialization of requests. | 47526 | No Perforce job exists for this issue. | 0 | 33356 | 8 years, 47 weeks, 3 days ago | 0|i062mf: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-965 | Need a multi-update command to allow multiple znodes to be updated safely |
New Feature | Closed | Major | Fixed | Ted Dunning | Ted Dunning | Ted Dunning | 27/Dec/10 19:18 | 23/Nov/11 14:22 | 30/Jun/11 18:54 | 3.3.3 | 3.4.0 | 0 | 14 | ZOOKEEPER-966, ZOOKEEPER-967, ZOOKEEPER-968 | ZOOKEEPER-1124, ZOOKEEPER-911 | The basic idea is to have a single method called "multi" that will accept a list of create, delete, update or check objects each of which has a desired version or file state in the case of create. If all of the version and existence constraints can be satisfied, then all updates will be done atomically. Two API styles have been suggested. One has a list as above and the other style has a "Transaction" that allows builder-like methods to build a set of updates and a commit method to finalize the transaction. This can trivially be reduced to the first kind of API so the list based API style should be considered the primitive and the builder style should be implemented as syntactic sugar. The total size of all the data in all updates and creates in a single transaction should be limited to 1MB. Implementation-wise this capability can be done using standard ZK internals. The changes include: - update to ZK clients to all the new call - additional wire level request - on the server, in the code that converts transactions to idempotent form, the code should be slightly extended to convert a list of operations to idempotent form. - on the client, a down-rev server that rejects the multi-update should be detected gracefully and an informative exception should be thrown. To facilitate shared development, I have established a github repository at https://github.com/tdunning/zookeeper and am happy to extend committer status to anyone who agrees to donate their code back to Apache. The final patch will be attached to this bug as normal. |
47527 | No Perforce job exists for this issue. | 21 | 33357 | 8 years, 38 weeks, 4 days ago |
Reviewed
|
0|i062mn: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-964 | How to avoid dead nodes generated? These nodes can't be deleted because there parent don't have delete and setacl permission. |
Wish | Resolved | Major | Won't Fix | Unassigned | allengao | allengao | 26/Dec/10 22:26 | 15/May/14 16:53 | 15/May/14 16:53 | 3.3.2 | 3.5.0 | server | 0 | 4 | 1209600 | 1209600 | 0% | i686-suse-linux | When a node which do not have setacl and delete permission was created (eg. permits=0x01), its children will never be deleted, even use superDigest。So, how to avoid this situation? | 0% | 0% | 1209600 | 1209600 | 36642 | No Perforce job exists for this issue. | 1 | 42085 | 5 years, 45 weeks, 6 days ago | dead node | 0|i07khb: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-963 | Make Forrest work with JDK6 |
Bug | Closed | Major | Fixed | Carl Steinbach | Carl Steinbach | Carl Steinbach | 23/Dec/10 03:34 | 23/Nov/11 14:22 | 28/Dec/10 20:08 | 3.3.3, 3.4.0 | build, documentation | 0 | 1 | ZOOKEEPER-925 | It's possible to make Forrest work with JDK6 by disabling sitemap validation in the forrest.properties file. See FOR-984 and PIG-1508 for more details. |
47528 | No Perforce job exists for this issue. | 1 | 32797 | 9 years, 13 weeks, 1 day ago |
Reviewed
|
0|i05z67: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-962 | leader/follower coherence issue when follower is receiving a DIFF |
Bug | Closed | Critical | Fixed | Chia-Hung Lin | Camille Fournier | Camille Fournier | 21/Dec/10 13:42 | 23/Nov/11 14:22 | 23/Jan/11 00:31 | 3.3.2 | 3.3.3, 3.4.0 | server | 0 | 3 | ZOOKEEPER-919 | From mailing list: It seems like we rely on the LearnerHandler thread startup to capture all of the missing committed transactions in the SNAP or DIFF, but I don't see anything (especially in the DIFF case) that is preventing us for committing more transactions before we actually start forwarding updates to the new follower. Let me explain using my example from ZOOKEEPER-919. Assume we have quorum already, so the leader can be processing transactions while my follower is starting up. I'm a follower at zxid N-5, the leader is at N. I send my FOLLOWERINFO packet to the leader with that information. The leader gets the proposals from its committed log (time T1), then syncs on the proposal list (LearnerHandler line 267. Why? It's a copy of the underlying proposal list... this might be part of our problem). I check to see if the peerLastZxid is within my max and min committed log and it is, so I'm going to send a diff. I set the zxidToSend to be the maxCommittedLog at time T3 (we already know this is sketchy), and forward the proposals from my copied proposal list starting at the peerLastZxid+1 up to the last proposal transaction (as seen at time T1). After I have queued up all those diffs to send, I tell the leader to startFowarding updates to this follower (line 308). So, let's say that at time T2 I actually swap out the leader to the thread that is handling the various request processors, and see that I got enough votes to commit zxid N+1. I commit N+1 and so my maxCommittedLog at T3 is N+1, but this proposal is not in the list of proposals that I got back at time T1, so I don't forward this diff to the client. Additionally, I processed the commit and removed it from my leader's toBeApplied list. So when I call startForwarding for this new follower, I don't see this transaction as a transaction to be forwarded. There's one problem. Let's also imagine, however, that I commit N+1 at time T4. The maxCommittedLog value is consistent with the max of the diff packets I am going to send the follower. But, I still committed N+1 and removed it from the toBeApplied list before calling startFowarding with this follower. How does the follower get this transaction? Does it? To put it another way, here is the thread interaction, hopefully formatted so you can read it... LearnerHandlerThread RequestProcessorThread T1(LH): get list of proposals (COPY) T2(RPT): commit N+1, remove from toBeApplied T3(LH): get maxCommittedLog T4(LH): send diffs from view at T1 T5(LH): startForwarding Or T1(LH): get list of proposals (COPY) T2(LH): get maxCommittedLog T3(RPT): commit N+1, remove from toBeApplied T4(LH): send diffs from view at T1 T5(LH): startFowarding I'm trying to figure out what, if anything, keeps the requests from being committed, removed, and never seen by the follower before it fully starts up. |
47529 | No Perforce job exists for this issue. | 6 | 32798 | 9 years, 9 weeks, 4 days ago | 0|i05z6f: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-961 | Watch recovery after disconnection when connection string contains a prefix |
Bug | Closed | Critical | Fixed | Matthias Spycher | pmpm47 | pmpm47 | 21/Dec/10 11:07 | 23/Nov/11 14:22 | 14/Sep/11 01:51 | 3.3.1 | 3.3.4, 3.4.0 | java client | 0 | 3 | ZOOKEEPER-838 | Windows 32 bits | Let's say you're using connection string "127.0.0.1:2182/foo". 1) put a childrenchanged watch on relative / (that is, on absolute path /foo) 2) stop the zk server 3) start the zk server 4) at this point, the client recovers the connection, and should have put back a watch on relative path /, but instead the client puts a watch on the *absolute* path / - if some other client adds or removes a node under /foo, nothing will happen - if some other client adds or removes a node under /, then you will get an error from the zk client library (string operation error) |
34438 | No Perforce job exists for this issue. | 5 | 32799 | 8 years, 27 weeks, 2 days ago |
Reviewed
|
disconnected watch | 0|i05z6n: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-958 | Flag to turn off autoconsume in hedwig c++ client |
Bug | Closed | Major | Fixed | Ivan Kelly | Ivan Kelly | Ivan Kelly | 15/Dec/10 04:18 | 23/Nov/11 14:22 | 21/Dec/10 14:34 | 3.4.0 | 3.4.0 | contrib-hedwig | 0 | 1 | Currently the hedwig cpp client will automatically send a consume message to the server when the calling client indicated that it has received the message. If the client wants to queue the messages and not acknowledge them to the server immediately, they need to block, which means interfering with any other running callbacks. | 47530 | No Perforce job exists for this issue. | 1 | 32800 | 9 years, 14 weeks ago |
Reviewed
|
0|i05z6v: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-957 | zkCleanup.sh doesn't do anything |
Bug | Closed | Major | Fixed | Ted Dunning | Ted Dunning | Ted Dunning | 13/Dec/10 12:09 | 23/Nov/11 14:21 | 14/Dec/10 22:17 | 3.3.2 | 3.3.3, 3.4.0 | 0 | 1 | Somebody left some echo statements in the zkCleanup.sh which prevents the java commands from actually running. Patch coming forthwith. |
47531 | No Perforce job exists for this issue. | 1 | 32801 | 9 years, 15 weeks, 1 day ago |
Reviewed
|
0|i05z73: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-955 | Use Atomic(Integer|Long) for (Z)Xid |
Improvement | Resolved | Trivial | Won't Fix | Thomas Koch | Thomas Koch | Thomas Koch | 07/Dec/10 05:41 | 16/May/14 18:34 | 16/May/14 18:34 | 3.5.0 | java client, server | 0 | 2 | As I've read last weekend in the fantastic book "Clean Code", it'd be much faster to use AtomicInteger or AtomicLong instead of synchronization blocks around each access to an int or long. The key difference is, that a synchronization block will in any case acquire and release a lock. The atomic classes use "optimistic locking", a CPU operation that only changes a value if it still has not changed since the last read. In most cases the value has not changed since the last visit so the operation is just as fast as a normal operation. If it had changed, then we read again and try to change again. [1] Clean Code: A Handbook of Agile Software Craftsmanship (Robert C. Martin) |
71224 | No Perforce job exists for this issue. | 1 | 42086 | 5 years, 45 weeks, 6 days ago |
Reviewed
|
Atomic | 0|i07khj: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-954 | Findbugs/ClientCnxn: Bug type JLM_JSR166_UTILCONCURRENT_MONITORENTER |
Bug | Patch Available | Minor | Unresolved | Hiroshi Ikeda | Thomas Koch | Thomas Koch | 29/Nov/10 04:21 | 02/Mar/16 20:47 | java client | 0 | 2 | ZOOKEEPER-794 | JLM Synchronization performed on java.util.concurrent.LinkedBlockingQueue in org.apache.zookeeper.ClientCnxn$EventThread.queuePacket(ClientCnxn$Packet) Bug type JLM_JSR166_UTILCONCURRENT_MONITORENTER (click for details) In class org.apache.zookeeper.ClientCnxn$EventThread In method org.apache.zookeeper.ClientCnxn$EventThread.queuePacket(ClientCnxn$Packet) Type java.util.concurrent.LinkedBlockingQueue Value loaded from field org.apache.zookeeper.ClientCnxn$EventThread.waitingEvents At ClientCnxn.java:[line 411] JLM Synchronization performed on java.util.concurrent.LinkedBlockingQueue in org.apache.zookeeper.ClientCnxn$EventThread.run() Bug type JLM_JSR166_UTILCONCURRENT_MONITORENTER (click for details) In class org.apache.zookeeper.ClientCnxn$EventThread In method org.apache.zookeeper.ClientCnxn$EventThread.run() Type java.util.concurrent.LinkedBlockingQueue Value loaded from field org.apache.zookeeper.ClientCnxn$EventThread.waitingEvents At ClientCnxn.java:[line 436] The respective code: 409 public void queuePacket(Packet packet) { 410 if (wasKilled) { 411 synchronized (waitingEvents) { 412 if (isRunning) waitingEvents.add(packet); 413 else processEvent(packet); 414 } 415 } else { 416 waitingEvents.add(packet); 417 } 418 } 419 420 public void queueEventOfDeath() { 421 waitingEvents.add(eventOfDeath); 422 } 423 424 @Override 425 public void run() { 426 try { 427 isRunning = true; 428 while (true) { 429 Object event = waitingEvents.take(); 430 if (event == eventOfDeath) { 431 wasKilled = true; 432 } else { 433 processEvent(event); 434 } 435 if (wasKilled) 436 synchronized (waitingEvents) { 437 if (waitingEvents.isEmpty()) { 438 isRunning = false; 439 break; 440 } 441 } 442 } |
36643 | No Perforce job exists for this issue. | 2 | 32802 | 4 years, 5 weeks, 3 days ago | 0|i05z7b: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-953 | ZOOKEEPER-940 review project branding requirements, report to board |
Sub-task | Resolved | Major | Fixed | Patrick D. Hunt | Patrick D. Hunt | Patrick D. Hunt | 24/Nov/10 21:04 | 07/Feb/11 13:30 | 07/Feb/11 13:30 | 0 | 0 | ZOOKEEPER-941 | 47532 | No Perforce job exists for this issue. | 0 | 33358 | 9 years, 7 weeks, 3 days ago | 0|i062mv: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-952 | ZOOKEEPER-940 scrub codebase for references to pre-TLP locations. |
Sub-task | Resolved | Major | Not A Problem | Mahadev Konar | Patrick D. Hunt | Patrick D. Hunt | 24/Nov/10 18:06 | 08/Oct/13 17:55 | 08/Oct/13 17:55 | 0 | 0 | The codebase needs to be scrubbed of references to hadoop and old locations (web site, wiki, svn, mailing lists, etc...) |
214208 | No Perforce job exists for this issue. | 0 | 42087 | 9 years, 16 weeks, 6 days ago | 0|i07khr: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-951 | ZOOKEEPER-940 monthly board reports for first 3 months (then quarterly reports) |
Sub-task | Resolved | Major | Implemented | Patrick D. Hunt | Patrick D. Hunt | Patrick D. Hunt | 24/Nov/10 18:01 | 08/Oct/13 17:55 | 08/Oct/13 17:55 | 0 | 0 | Board reporting guidelines can be found here: http://apache.org/foundation/board/reporting note that ZOOKEEPER-953 should also be addressed (branding checklist) |
214207 | No Perforce job exists for this issue. | 0 | 42088 | 9 years, 18 weeks, 1 day ago | 0|i07khz: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-950 | ZOOKEEPER-940 create bylaws |
Sub-task | Resolved | Major | Fixed | Unassigned | Patrick D. Hunt | Patrick D. Hunt | 24/Nov/10 17:59 | 07/Feb/11 13:40 | 07/Feb/11 13:40 | 0 | 0 | 47533 | No Perforce job exists for this issue. | 0 | 33359 | 9 years, 7 weeks, 3 days ago | 0|i062n3: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-949 | ZOOKEEPER-940 work with infra to move the git mirror |
Sub-task | Resolved | Major | Fixed | Patrick D. Hunt | Patrick D. Hunt | Patrick D. Hunt | 24/Nov/10 17:37 | 30/Nov/10 13:14 | 30/Nov/10 13:14 | 0 | 0 | 47534 | No Perforce job exists for this issue. | 0 | 33360 | 9 years, 17 weeks, 2 days ago | 0|i062nb: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-948 | ZOOKEEPER-940 send mail to the zk mailing lists about the list name changes |
Sub-task | Resolved | Major | Fixed | Patrick D. Hunt | Patrick D. Hunt | Patrick D. Hunt | 24/Nov/10 13:16 | 24/Nov/10 15:58 | 24/Nov/10 15:58 | 0 | 0 | 47535 | No Perforce job exists for this issue. | 0 | 33361 | 9 years, 18 weeks, 1 day ago | 0|i062nj: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-947 | ZOOKEEPER-940 move the wiki content to it's new home |
Sub-task | Resolved | Major | Fixed | Benjamin Reed | Patrick D. Hunt | Patrick D. Hunt | 24/Nov/10 12:50 | 07/Feb/11 00:31 | 07/Feb/11 00:31 | 0 | 0 | 47536 | No Perforce job exists for this issue. | 0 | 33362 | 9 years, 7 weeks, 3 days ago | 0|i062nr: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-946 | ZOOKEEPER-940 update howtorelease page with new details (svn, filepaths, notifications and such) |
Sub-task | Resolved | Major | Fixed | Patrick D. Hunt | Patrick D. Hunt | Patrick D. Hunt | 24/Nov/10 12:49 | 11/Mar/11 01:25 | 11/Mar/11 01:25 | 0 | 0 | 47537 | No Perforce job exists for this issue. | 0 | 33363 | 9 years, 2 weeks, 6 days ago | 0|i062nz: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-945 | ZOOKEEPER-940 update legacy website with new mailing list details |
Sub-task | Resolved | Major | Fixed | Patrick D. Hunt | Patrick D. Hunt | Patrick D. Hunt | 24/Nov/10 12:47 | 24/Nov/10 17:58 | 24/Nov/10 17:58 | 0 | 0 | 47538 | No Perforce job exists for this issue. | 0 | 33364 | 9 years, 18 weeks, 1 day ago | 0|i062o7: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-944 | ZOOKEEPER-940 perform a svn move to move the ZK codebase out from under hadoop |
Sub-task | Resolved | Major | Fixed | Patrick D. Hunt | Patrick D. Hunt | Patrick D. Hunt | 24/Nov/10 12:47 | 24/Nov/10 17:19 | 24/Nov/10 17:17 | 0 | 0 | 47539 | No Perforce job exists for this issue. | 0 | 33365 | 9 years, 18 weeks, 1 day ago | 0|i062of: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-943 | ZOOKEEPER-940 address hudson configuration change for svn move |
Sub-task | Resolved | Major | Fixed | Patrick D. Hunt | Patrick D. Hunt | Patrick D. Hunt | 24/Nov/10 12:46 | 24/Nov/10 17:58 | 24/Nov/10 17:14 | 0 | 0 | 47540 | No Perforce job exists for this issue. | 0 | 33366 | 9 years, 18 weeks, 1 day ago | 0|i062on: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-942 | ZOOKEEPER-940 address hudson configuration change for mailing list |
Sub-task | Resolved | Major | Fixed | Patrick D. Hunt | Patrick D. Hunt | Patrick D. Hunt | 24/Nov/10 12:46 | 24/Nov/10 15:57 | 24/Nov/10 15:57 | 0 | 0 | 47541 | No Perforce job exists for this issue. | 0 | 33367 | 9 years, 18 weeks, 1 day ago | 0|i062ov: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-941 | ZOOKEEPER-940 setup the new website on zookeeper.apache.org |
Sub-task | Resolved | Major | Fixed | Benjamin Reed | Patrick D. Hunt | Patrick D. Hunt | 24/Nov/10 12:45 | 07/Feb/11 13:40 | 07/Feb/11 13:40 | 0 | 0 | ZOOKEEPER-953 | This uses the new CMS system. See INFRA-3228 for details. | 47542 | No Perforce job exists for this issue. | 0 | 33368 | 9 years, 16 weeks, 6 days ago | 0|i062p3: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-940 | Umbrella JIRA for move to TLP |
Task | Resolved | Major | Implemented | Patrick D. Hunt | Patrick D. Hunt | Patrick D. Hunt | 24/Nov/10 12:42 | 08/Oct/13 17:55 | 08/Oct/13 17:55 | 0 | 0 | ZOOKEEPER-941, ZOOKEEPER-942, ZOOKEEPER-943, ZOOKEEPER-944, ZOOKEEPER-945, ZOOKEEPER-946, ZOOKEEPER-947, ZOOKEEPER-948, ZOOKEEPER-949, ZOOKEEPER-950, ZOOKEEPER-951, ZOOKEEPER-952, ZOOKEEPER-953 | This is an umbrella jira for our move to TLP status. Please create subtasks for any issues you find related to the move. Note that INFRA-3228 is now closed, so a number of infra related issues have already been closed. This jira (subs) is for additional issues we need to address. |
214206 | No Perforce job exists for this issue. | 0 | 42089 | 9 years, 18 weeks, 1 day ago | 0|i07ki7: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-939 | the threads number of a zookeeper is increased all the time |
Bug | Resolved | Major | Duplicate | Unassigned | Qian Ye | Qian Ye | 24/Nov/10 02:53 | 05/Sep/11 23:11 | 05/Sep/11 23:10 | 3.3.0 | server | 0 | 0 | ZOOKEEPER-880 | Linux 2.6.9-52bs #2 SMP Fri Jan 26 13:34:38 CST 2007 x86_64 x86_64 x86_64 GNU/Linux | I have a group of zookeeper servers, there are three servers in this group. server.0=10.81.4.11:2888:3888 server.1=10.23.240.93:2888:3888 server.2=10.23.244.224:2888:3888 At first, the cluster ran well. About several days ago, I shut down the zookeeper process on one of servers(server.2)., and today, I find that the other two servers run in wired status(the network is fine). The zookeeper process take pretty much resource on the two servers: on server.1 (it's the leader) PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 26836 work 18 0 12.8g 803m 8724 S 3.7 10.1 195:56.56 java $ ll /proc/26836/fd/ | wc -l 3586 [work@tc-test-aos03.tc.baidu.com conf]$ ll /proc/26836/task/ | wc -l 10510 some warning log: 2010-11-24 15:37:48,705 - WARN [Thread-37409:QuorumCnxManager$SendWorker@589] - Send worker leaving thread 2010-11-24 15:39:48,626 - WARN [Thread-37414:QuorumCnxManager$RecvWorker@658] - Connection broken: java.nio.channels.AsynchronousCloseException at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:185) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:263) at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:629) 2010-11-24 15:39:48,656 - WARN [Thread-37413:QuorumCnxManager$SendWorker@581] - Interrupted while waiting for message on queue java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:1899) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:1976) at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:342) at org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:570) 2010-11-24 15:39:48,657 - WARN [Thread-37413:QuorumCnxManager$SendWorker@589] - Send worker leaving thread 2010-11-24 15:41:48,614 - WARN [Thread-37417:QuorumCnxManager$SendWorker@581] - Interrupted while waiting for message on queue java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:1899) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:1976) at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:342) at org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:570) 2010-11-24 15:41:48,643 - WARN [Thread-37418:QuorumCnxManager$RecvWorker@658] - Connection broken: java.nio.channels.AsynchronousCloseException at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:185) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:263) at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:629) 2010-11-24 15:41:48,662 - WARN [Thread-37417:QuorumCnxManager$SendWorker@589] - Send worker leaving thread 2010-11-24 15:43:48,627 - WARN [Thread-37421:QuorumCnxManager$SendWorker@581] - Interrupted while waiting for message on queue java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:1899) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:1976) at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:342) at org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:570) 2010-11-24 15:43:48,627 - WARN [Thread-37422:QuorumCnxManager$RecvWorker@658] - Connection broken: java.nio.channels.AsynchronousCloseException at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:185) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:263) at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:629) 2010-11-24 15:43:48,654 - WARN [Thread-37421:QuorumCnxManager$SendWorker@589] - Send worker leaving thread 2010-11-24 15:44:48,622 - WARN [Thread-37424:QuorumCnxManager$RecvWorker@658] - Connection broken: java.nio.channels.AsynchronousCloseException at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:185) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:263) at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:629) 2010-11-24 15:44:48,652 - WARN [Thread-37423:QuorumCnxManager$SendWorker@581] - Interrupted while waiting for message on queue java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:1899) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:1976) at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:342) at org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:570) 2010-11-24 15:44:48,653 - WARN [Thread-37423:QuorumCnxManager$SendWorker@589] - Send worker leaving thread 2010-11-24 15:45:48,668 - WARN [Thread-37426:QuorumCnxManager$RecvWorker@658] - Connection broken: java.io.IOException: Channel eof at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:630) 2010-11-24 15:46:48,647 - WARN [Thread-37427:QuorumCnxManager$SendWorker@581] - Interrupted while waiting for message on queue java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:1899) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:1976) at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:342) at org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:570) 2010-11-24 15:46:48,722 - WARN [Thread-37428:QuorumCnxManager$RecvWorker@658] - Connection broken: java.nio.channels.AsynchronousCloseException at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:185) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:263) at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:629) 2010-11-24 15:46:48,736 - WARN [Thread-37427:QuorumCnxManager$SendWorker@589] - Send worker leaving thread 2010-11-24 15:47:48,687 - WARN [Thread-37430:QuorumCnxManager$RecvWorker@658] - Connection broken: java.io.IOException: Channel eof at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:630) on server.0 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 27322 work 19 0 15.2g 943m 9140 S 38.6 11.8 1396:51 java $ ll /proc/27322/fd/ | wc -l 3587 $ ll /proc/27322/task/ | wc -l 12938 2010-11-24 15:37:49,269 - WARN [Thread-37407:QuorumCnxManager$SendWorker@589] - Send worker leaving thread 2010-11-24 15:39:49,235 - WARN [Thread-37412:QuorumCnxManager$RecvWorker@658] - Connection broken: java.io.IOException: Channel eof at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:630) 2010-11-24 15:39:49,410 - WARN [Thread-37411:QuorumCnxManager$SendWorker@581] - Interrupted while waiting for message on queue java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:1899) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:1976) at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:342) at org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:570) 2010-11-24 15:39:49,411 - WARN [Thread-37411:QuorumCnxManager$SendWorker@589] - Send worker leaving thread 2010-11-24 15:41:49,314 - WARN [Thread-37416:QuorumCnxManager$RecvWorker@658] - Connection broken: java.io.IOException: Channel eof at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:630) 2010-11-24 15:41:49,383 - WARN [Thread-37415:QuorumCnxManager$SendWorker@581] - Interrupted while waiting for message on queue java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:1899) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:1976) at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:342) at org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:570) 2010-11-24 15:41:49,405 - WARN [Thread-37415:QuorumCnxManager$SendWorker@589] - Send worker leaving thread 2010-11-24 15:43:49,372 - WARN [Thread-37420:QuorumCnxManager$RecvWorker@658] - Connection broken: java.io.IOException: Channel eof at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:630) 2010-11-24 15:43:49,512 - WARN [Thread-37419:QuorumCnxManager$SendWorker@581] - Interrupted while waiting for message on queue java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:1899) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:1976) at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:342) at org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:570) 2010-11-24 15:43:49,513 - WARN [Thread-37419:QuorumCnxManager$SendWorker@589] - Send worker leaving thread 2010-11-24 15:44:49,407 - WARN [Thread-37422:QuorumCnxManager$RecvWorker@658] - Connection broken: java.io.IOException: Channel eof at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:630) 2010-11-24 15:45:49,645 - WARN [Thread-37424:QuorumCnxManager$RecvWorker@658] - Connection broken: java.nio.channels.AsynchronousCloseException at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:185) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:263) at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:629) 2010-11-24 15:45:49,781 - WARN [Thread-37423:QuorumCnxManager$SendWorker@581] - Interrupted while waiting for message on queue java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:1899) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:1976) at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:342) at org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:570) 2010-11-24 15:45:49,799 - WARN [Thread-37423:QuorumCnxManager$SendWorker@589] - Send worker leaving thread 2010-11-24 15:46:49,495 - WARN [Thread-37427:QuorumCnxManager$RecvWorker@658] - Connection broken: java.io.IOException: Channel eof at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:630) 2010-11-24 15:47:49,541 - WARN [Thread-37429:QuorumCnxManager$RecvWorker@658] - Connection broken: java.nio.channels.AsynchronousCloseException at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:185) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:263) at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:629) 2010-11-24 15:47:49,622 - WARN [Thread-37428:QuorumCnxManager$SendWorker@581] - Interrupted while waiting for message on queue java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:1899) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:1976) at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:342) at org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:570) 2010-11-24 15:47:49,622 - WARN [Thread-37428:QuorumCnxManager$SendWorker@589] - Send worker leaving thread 2010-11-24 15:48:48,827 - WARN [Thread-37431:QuorumCnxManager$RecvWorker@658] - Connection broken: java.io.IOException: Channel eof at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:630) What's more, the number of threads under the zookeeper process is still increasing time by time. It seems that , something is wrong in communication of the two servers. Have anyone met such problem before? |
62364 | No Perforce job exists for this issue. | 0 | 32803 | 8 years, 29 weeks, 2 days ago | server | 0|i05z7j: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-938 | Support Kerberos authentication of clients. |
New Feature | Closed | Major | Fixed | Eugene Joseph Koontz | Eugene Joseph Koontz | Eugene Joseph Koontz | 23/Nov/10 12:25 | 28/Apr/14 00:03 | 18/Aug/11 18:05 | 3.4.0 | java client, server | 0 | 15 | HBASE-3025, ZOOKEEPER-1920, ZOOKEEPER-1236, ZOOKEEPER-1373, ZOOKEEPER-1437, ZOOKEEPER-1181, ZOOKEEPER-1185, ZOOKEEPER-1195, ZOOKEEPER-1112, ZOOKEEPER-1045, HADOOP-4487, GIRAPH-265, ZOOKEEPER-1420, ZOOKEEPER-1469, ZOOKEEPER-1422, HBASE-2418, HIVE-2467, ZOOKEEPER-329, ZOOKEEPER-896 | Support Kerberos authentication of clients. The following usage would let an admin use Kerberos authentication to assign ACLs to authenticated clients. 1. Admin logs into zookeeper (not necessarily through Kerberos however). 2. Admin decides that a new node called '/mynode' should be owned by the user 'zkclient' and have full permissions on this. 3. Admin does: zk> create /mynode content sasl:zkclient@FOOFERS.ORG:cdrwa 4. User 'zkclient' logins to kerberos using the command line utility 'kinit'. 5. User connects to zookeeper server using a Kerberos-enabled version of zkClient (ZookeeperMain). 6. Behind the scenes, the client and server exchange authentication information. User is now authenticated as 'zkclient'. 7. User accesses /mynode with permissions 'cdrwa'. |
47543 | No Perforce job exists for this issue. | 17 | 33369 | 8 years, 31 weeks, 6 days ago | ZOOKEEPER-938 : support Kerberos authentication via SASL. |
Reviewed
|
0|i062pb: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-937 | test -e not available on solaris /bin/sh |
Bug | Closed | Major | Fixed | Erik Hetzner | Erik Hetzner | Erik Hetzner | 19/Nov/10 17:12 | 23/Nov/11 14:21 | 07/Dec/10 14:02 | 3.3.0, 3.3.1, 3.3.2 | 3.4.0 | scripts | 0 | 1 | SunOS xxx 5.10 Generic_142901-13 i86pc i386 i86pc Solaris |
test -e FILENAME is not support on /bin/sh in solaris. This is used in bin/zkEnv.sh. We can substitute test -f FILENAME. Attaching a patch. | 47544 | No Perforce job exists for this issue. | 2 | 32804 | 9 years, 16 weeks ago |
Reviewed
|
0|i05z7r: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-936 | zkpython is leaking ACL_vector |
Bug | Open | Major | Unresolved | Unassigned | Gustavo Niemeyer | Gustavo Niemeyer | 18/Nov/10 11:02 | 14/Dec/19 06:08 | 3.7.0 | contrib-bindings | 0 | 3 | It looks like there are no calls to deallocate_ACL_vector() within zookeeper.c in the zkpython binding, which means that (at least) the result of zoo_get_acl() must be leaking. | 36644 | No Perforce job exists for this issue. | 0 | 32805 | 8 years, 41 weeks, 1 day ago | 0|i05z7z: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-935 | Concurrent primitives library - shared lock |
Improvement | Open | Minor | Unresolved | Chia-Hung Lin | Chia-Hung Lin | Chia-Hung Lin | 18/Nov/10 04:33 | 05/Feb/20 07:15 | 3.7.0, 3.5.8 | recipes | 0 | 3 | Debian squeeze JDK 1.6.x zookeeper trunk |
I create this jira to add sharedock function. The function follows recipes at http://hadoop.apache.org/zookeeper/docs/r3.1.2/recipes.html#Shared+Locks |
36 | No Perforce job exists for this issue. | 1 | 42090 | 8 years, 16 weeks ago | zookeeper, shared lock, recipes, lock | 0|i07kif: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-934 | ZOOKEEPER-900 Add sanity check for server ID |
Sub-task | Open | Major | Unresolved | Unassigned | Vishal Kher | Vishal Kher | 17/Nov/10 11:14 | 05/Feb/20 07:16 | 3.7.0, 3.5.8 | 0 | 0 | ZOOKEEPER-883, ZOOKEEPER-880 | 2. Should I add a check to reject connections from peers that are not listed in the configuration file? Currently, we are not doing any sanity check for server IDs. I think this might fix ZOOKEEPER-851. The fix is simple. However, I am not sure if anyone in community is relying on this ability. |
36645 | No Perforce job exists for this issue. | 0 | 42091 | 9 years, 18 weeks, 6 days ago | 0|i07kin: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-933 | ZOOKEEPER-900 Remove wildcard QuorumPeer.OBSERVER_ID |
Sub-task | Open | Major | Unresolved | Unassigned | Vishal Kher | Vishal Kher | 17/Nov/10 11:11 | 05/Feb/20 07:16 | 3.7.0, 3.5.8 | 0 | 1 | ZOOKEEPER-991 | 1. I have a question about the following piece of code in QCM: if (remoteSid == QuorumPeer.OBSERVER_ID) { /* * Choose identifier at random. We need a value to identify * the connection. */ remoteSid = observerCounter--; LOG.info("Setting arbitrary identifier to observer: " + remoteSid); } Should we allow this? The problem with this code is that if a peer connects twice with QuorumPeer.OBSERVER_ID, we will end up creating threads for this peer twice. This could result in redundant SendWorker/RecvWorker threads. I haven't used observers yet. The documentation http://hadoop.apache.org/zookeeper/docs/r3.3.0/zookeeperObservers.html says that just like followers, observers should have server IDs. In which case, why do we want to provide a wild-card? |
36646 | No Perforce job exists for this issue. | 0 | 42092 | 9 years, 18 weeks, 6 days ago | 0|i07kiv: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-932 | ZOOKEEPER-900 Move blocking read/write calls to SendWorker and RecvWorker Threads |
Sub-task | Open | Major | Unresolved | Vishal Kher | Vishal Kher | Vishal Kher | 17/Nov/10 11:08 | 05/Feb/20 07:16 | 3.3.2 | 3.7.0, 3.5.8 | leaderElection | 0 | 1 | Copying relevant comments: Vishal K added a comment - 02/Nov/10 02:09 PM Hi Flavio, I have a suggestion for changing the blocking IO code in QuorumCnxManager. It keeps the current code structure and requires a small amount of changes. I am not sure if these comments should go in ZOOKEEPER-901. ZOOKEEPER-901 is probably addressing netty as well. Please feel free to close this JIRA if you intend to make all the changes as a part of ZOOKEEPER-901. Basically we jusy need to move parts of initiateConnection and receiveConnection to SenderWorker and ReceiveWorker. A. Current flow for receiving connection: 1. accept connection in Listener.run() 2. receiveConnection() * Read remote server's ID * Take action based on my ID and remote server's ID (disconnect and reconnect if my ID is > remote server's ID). * kill current set of SenderWorker and ReciveWorker threads * Start a new pair B Current flow for initiating connection: 1. In connectOne(), connect if not already connected. else return. 2. send my ID to the remote server 3. if my ID < remote server disconnect and return 4. if my ID > remote server * kill current set of SenderWorker and ReceiveWorkter threads for the remote server * Start a new pair Proposed changes: Move the code that performs any blocking IO in SenderWorker and ReceiveWorker. A. Proposed flow for receiving connection: 1. accept connection in Listener.run() 2. receiveConnection() * kill current set of SenderWorker and ReciveWorker threads * Start a new pair Proposed changed to SenderWorker: * Read remote server's ID * Take action based on my ID and remote server's ID (disconnect and reconnect if my ID is > remote server's ID). * Proceed to normal operation B Proposed flow for initiating connection: 1. in connectOne(), return if already connected 2. Start a new SenderWorker and ReceiveWorker pair 2. In SenderWorker * connect to remote server * write my ID * if my ID < remote server disconnect and return (shutdown the pair). * Proceed to normal operation Questions: * In QuorumCnxManager, is it necessary to kill the current pair and restart a new one every time we receive a connect request? * In receiveConnection we may choose to reject an accepted connection if a thread in SenderWorker is in the process of connecting. Otherwise a server with ID < remote server may keep sending frequent connect request that will result in the remote server closing connections for this peer. But I think we add a delay before sending notifications, which might be good enough to prevent this problem. Let me know what you think about this. I can also help with the implementation. Flavio Junqueira added a comment - 03/Nov/10 05:28 PM Hi Vishal, I like your proposal, it seems reasonable and not difficult to implement. On your questions: 1. I don't think it is necessary to kill a pair SenderWorker/RecvWorker every time, and I'd certainly support changing it; 2. I'm not sure where you're suggesting to introduce a delay. In the FLE code, a server sends a new batch of notifications if it changes its vote or if it times out waiting for a new notification. This timeout value increases over time. I was actually thinking that we should reset the timeout value upon receiving a notification. I think this is a bug.... Given that it is your proposal, I'd be happy to let you take a stab at it and help you out if you need a hand. Does it make sense for you? |
66890 | No Perforce job exists for this issue. | 5 | 42093 | 8 years, 35 weeks, 2 days ago | 0|i07kj3: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-930 | Hedwig c++ client uses a non thread safe logging library |
Bug | Resolved | Major | Fixed | Ivan Kelly | Ivan Kelly | Ivan Kelly | 15/Nov/10 04:55 | 17/Nov/10 05:55 | 16/Nov/10 13:28 | 3.3.2 | contrib-hedwig | 0 | 1 | 47545 | No Perforce job exists for this issue. | 2 | 32806 | 9 years, 19 weeks, 1 day ago |
Reviewed
|
0|i05z87: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-929 | hudson qabot incorrectly reporting issues as number 909 when the patch from 908 is the one being tested |
Bug | Resolved | Major | Cannot Reproduce | Patrick D. Hunt | Patrick D. Hunt | Patrick D. Hunt | 11/Nov/10 12:03 | 08/Oct/13 17:56 | 08/Oct/13 17:56 | build | 0 | 0 | Hi Nigel can you take a look at this? Following you'll see the email I got, notice that the patch is patch 908, however if you look at the hudson page it's linked to the change is documented as 909 patch file applied https://hudson.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/25/changes I looked at both jiras ZOOKEEPER-908 and ZOOKEEPER-909 both of these look good (the right names on patches) and qabot actually updated 908 with the comment (failure). However the "change" is listed as 909 which is wrong. [exec] -1 overall. Here are the results of testing the latest attachment [exec] http://issues.apache.org/jira/secure/attachment/12459361/ZOOKEEPER-908.patch [exec] against trunk revision 1033770. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] -1 tests included. The patch doesn't appear to include any new or modified tests. [exec] Please justify why no new tests are needed for this patch. [exec] Also please list what manual steps were performed to verify this patch. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. [exec] [exec] +1 core tests. The patch passed core unit tests. [exec] [exec] +1 contrib tests. The patch passed contrib unit tests. [exec] [exec] Test results: https://hudson.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/25//testReport/ [exec] Findbugs warnings: https://hudson.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/25//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html [exec] Console output: https://hudson.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/25//console [exec] [exec] This message is automatically generated. [exec] |
214205 | No Perforce job exists for this issue. | 0 | 32807 | 9 years, 15 weeks, 4 days ago | 0|i05z8f: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-928 | Follower should stop following and start FLE if it does not receive pings from the leader |
Bug | Resolved | Critical | Won't Fix | Unassigned | Vishal Kher | Vishal Kher | 10/Nov/10 15:06 | 11/Nov/10 12:07 | 10/Nov/10 16:40 | 3.3.2 | quorum, server | 0 | 2 | In Follower.followLeader() after syncing with the leader, the follower does: while (self.isRunning()) { readPacket(qp); processPacket(qp); } It looks like it relies on socket timeout expiry to figure out if the connection with the leader has gone down. So a follower *with no cilents* may never notice a faulty leader if a Leader has a software hang, but the TCP connections with the peers are still valid. Since it has no cilents, it won't hearbeat with the Leader. If majority of followers are not connected to any clients, then FLE will fail even if other followers attempt to elect a new leader. We should keep track of pings received from the leader and see if we havent seen a ping packet from the leader for (syncLimit * tickTime) time and give up following the leader. |
214204 | No Perforce job exists for this issue. | 0 | 32808 | 9 years, 20 weeks ago | 0|i05z8n: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-927 | there are currently 24 RAT warnings in the build -- address directly or via exclusions |
Improvement | Resolved | Minor | Fixed | Michi Mutsuzaki | Patrick D. Hunt | Patrick D. Hunt | 09/Nov/10 14:00 | 19/Jul/14 07:24 | 19/Jul/14 00:40 | 3.5.0 | build | 0 | 3 | We should either fix these, or add exclusions to build.xml. afaik the current warnings are not real errors/problems, but we should address this directly. (I eyeball it before every release) |
36647 | No Perforce job exists for this issue. | 3 | 42094 | 5 years, 35 weeks, 5 days ago |
Reviewed
|
0|i07kjb: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-926 | Fork Hadoop common's test-patch.sh and modify for Zookeeper |
Improvement | Closed | Major | Fixed | Nigel Daley | Nigel Daley | Nigel Daley | 09/Nov/10 02:43 | 10/Dec/15 21:54 | 10/Nov/10 01:23 | 3.4.0 | build | 0 | 0 | Zookeeper currently uses the test-patch.sh script from the Hadoop nightly dir. This is now out of date. I propose we just copy the updated one in Hadoop common and then modify for ZK. This will also help as ZK moves out of Hadoop to it's own TLP. | 47546 | No Perforce job exists for this issue. | 1 | 33370 | 9 years, 18 weeks, 5 days ago | 0|i062pj: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-925 | Consider maven site generation to replace our forrest site and documentation generation |
Task | Closed | Major | Fixed | Tamas Penzes | Patrick D. Hunt | Patrick D. Hunt | 08/Nov/10 20:53 | 02/Apr/19 06:40 | 07/Dec/18 06:34 | 3.5.4, 3.6.0, 3.4.13 | 3.6.0, 3.5.5, 3.4.14 | documentation | 0 | 7 | ZOOKEEPER-3153, ZOOKEEPER-3154, ZOOKEEPER-3155, ZOOKEEPER-3184 | BIGTOP-65, ZOOKEEPER-963, AVRO-319, HADOOP-7069, INFRA-3228 | See WHIRR-19 for some background. In whirr we looked at a number of site/doc generation facilities. In the end Maven site generation plugin turned out to be by far the best option. You can see our nascent site here (no attempt at styling,etc so far): http://incubator.apache.org/whirr/ In particular take a look at the quick start: http://incubator.apache.org/whirr/quick-start-guide.html which was generated from http://svn.apache.org/repos/asf/incubator/whirr/trunk/src/site/confluence/quick-start-guide.confluence notice this was standard wiki markup (confluence wiki markup, same as available from apache) You can read more about mvn site plugin here: http://maven.apache.org/guides/mini/guide-site.html Notice that other formats are available, not just confluence markup, also note that you can use different markup formats if you like in the same site (although probably not a great idea, but in some cases might be handy, for example whirr uses the confluence wiki, so we can pretty much copy/paste source docs from wiki to our site (svn) if we like) Re maven vs our current ant based build. It's probably a good idea for us to move the build to maven at some point. We could initially move just the doc generation, and then incrementally move functionality from build.xml to mvn over a longer time period. |
100% | 69000 | 0 | 214203 | No Perforce job exists for this issue. | 3 | 42095 | 1 year, 14 weeks, 6 days ago | 0|i07kjj: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-924 | Recipe: Fault tolerant communication layer using Zookeeper |
Task | Open | Major | Unresolved | Unassigned | kishore gopalakrishna | kishore gopalakrishna | 08/Nov/10 15:18 | 08/Nov/10 15:20 | recipes | 0 | 0 | ZOOKEEPER-923 | Any | This recipe caters to the following use case There are S(Active) + s(standby) sender nodes and R(Active) + r(standby) receiver nodes. The objective is following * If one of the S Active server goes down a standby node should take up the task. * If one of the R Active server goes down a standby node should take up the task. * When there is a change in receiver the Sender must get updated and send the message to correct destination This also uses recipe described in https://issues.apache.org/jira/browse/ZOOKEEPER-923 This was developed for a different project S4 which is also open sourced http://s4.io/. The communication layer and task management layer is completely independent of S4 and can be used in any application. |
36648 | No Perforce job exists for this issue. | 0 | 42096 | 9 years, 20 weeks, 3 days ago | 0|i07kjr: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-923 | TaskManagement Using Zookeeper Recipe |
Task | Open | Major | Unresolved | Unassigned | kishore gopalakrishna | kishore gopalakrishna | 08/Nov/10 15:17 | 08/Nov/10 15:18 | recipes | 0 | 0 | ZOOKEEPER-924 | Any | A typical use case in distributed system is " There are T tasks and P processes running but only T processes must be active always [ P > T ] and remaining P-T processes acting as stand by and be ready to take up a Task with one or more active processes fail". Zookeeper provides an excellent service which can be used to co ordinate among P processes and using the mechanism of locking we can ensure that there is always T processes active. Without a central co-ordinating service generally there will be 2T processes[ i.e atleast one back up for each process]. With Zookeeper we can decide P based on the failure rate. The assumption here are 1. At any time we have P > T. P can be chosen appropriately based on failure rate. 2. The tasks are stateless. That is any process P_i that takes up a task T_j does not know the state of the process P_k which previously processed T_j. This is not entirely true and there are ways to over come this draw back on a case by case basis. This was developed for a different project S4 which is also open sourced http://s4.io/. The communication layer and task management layer is completely independent of S4 and can be used in any application. |
36649 | No Perforce job exists for this issue. | 0 | 42097 | 9 years, 20 weeks, 3 days ago | 0|i07kjz: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-922 | enable faster timeout of sessions in case of unexpected socket disconnect |
Improvement | Open | Major | Unresolved | Camille Fournier | Camille Fournier | Camille Fournier | 08/Nov/10 10:43 | 05/Feb/20 07:15 | 3.7.0, 3.5.8 | server | 2 | 9 | HBASE-5843 | In the case when a client connection is closed due to socket error instead of the client calling close explicitly, it would be nice to enable the session associated with that client to time out faster than the negotiated session timeout. This would enable a zookeeper ensemble that is acting as a dynamic discovery provider to remove ephemeral nodes for crashed clients quickly, while allowing for a longer heartbeat-based timeout for java clients that need to do long stop-the-world GC. I propose doing this by setting the timeout associated with the crashed session to "minSessionTimeout". |
70759 | No Perforce job exists for this issue. | 1 | 42098 | 9 years, 7 weeks, 1 day ago | 0|i07kk7: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-921 | zkPython incorrectly checks for existence of required ACL elements |
Bug | Closed | Major | Fixed | Nicholas Knight | Nicholas Knight | Nicholas Knight | 08/Nov/10 03:51 | 23/Nov/11 14:22 | 28/Dec/10 19:46 | 3.3.1, 3.4.0 | 3.3.3, 3.4.0 | contrib-bindings | 0 | 2 | Mac OS X 10.6.4, included Python 2.6.1 | Calling {{zookeeper.create()}} seems, under certain circumstances, to be corrupting a subsequent call to Python's {{logging}} module. Specifically, if the node does not exist (but its parent does), I end up with a traceback like this when I try to make the logging call: {noformat} Traceback (most recent call last): File "zktest.py", line 21, in <module> logger.error("Boom?") File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/logging/__init__.py", line 1046, in error if self.isEnabledFor(ERROR): File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/logging/__init__.py", line 1206, in isEnabledFor return level >= self.getEffectiveLevel() File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/logging/__init__.py", line 1194, in getEffectiveLevel while logger: TypeError: an integer is required {noformat} But if the node already exists, or the parent does not exist, I get the appropriate NodeExists or NoNode exceptions. I'll be attaching a test script that can be used to reproduce this behavior. |
47547 | No Perforce job exists for this issue. | 2 | 32809 | 9 years, 13 weeks, 1 day ago |
Reviewed
|
0|i05z8v: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-920 | L7 (application layer) ping support |
New Feature | Open | Minor | Unresolved | Chang Song | Chang Song | Chang Song | 08/Nov/10 02:53 | 05/Feb/20 07:16 | 3.3.1 | 3.7.0, 3.5.8 | c client | 1 | 4 | Zookeeper is used in applications where fault tolerance is important. Its client i/o thread send/recv heartbeats to/fro Zookeeper ensemble to stay connected. However healthy heartbeat does not always means that the application that uses Zookeeper client is in good health, it only means that ZK client thread is in good health. This I needed something that can tagged onto Zookeeper ping that represents L7 (application) health as well. I have modified C client source to support this in minimal way. I am new to Zookeeper, so please code review this code. I am actually using this code in our in-house solution. https://github.com/tru64ufs/zookeeper/commit/2196d6d5114a2fd2c0a3bc9a55f4494d47d2aece Thank you very much. |
70773 | No Perforce job exists for this issue. | 1 | 42099 | 7 years, 1 week, 2 days ago | 0|i07kkf: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-919 | Ephemeral nodes remains in one of ensemble after deliberate SIGKILL |
Bug | Closed | Blocker | Duplicate | Unassigned | Chang Song | Chang Song | 04/Nov/10 09:43 | 23/Nov/11 14:22 | 18/Nov/11 20:11 | 3.3.1 | 3.3.3, 3.4.0 | server | 0 | 2 | ZOOKEEPER-962 | Linux CentOS 5.3 64bit, JDK 1.6.0-22 SLES 11 |
I was testing stability of Zookeeper ensemble for production deployment. Three node ensemble cluster configuration. In a loop, I kill/restart three Zookeeper clients that created one ephemeral node each, and at the same time, I killed Java process on one of ensemble (dont' know if it was a leader or not). Then I restarted Zookeeper on the server, It turns out that on two zookeeper ensemble servers, all the ephemeral nodes are gone (it should), but on the newly started Zookeeper server, the two old ephemeral nodes stayed. The zookeeper didn't restart in standalone mode since new ephemeral nodes gets created on all ensemble servers. I captured the log. 2010-11-04 17:48:50,201 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:17288:NIOServerCnxn$Factory@250] - Accepted socket connection from /10.25.131.21:11191 2010-11-04 17:48:50,202 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:17288:NIOServerCnxn@776] - Client attempting to establish new session at /10.25.131.21:11191 2010-11-04 17:48:50,203 - INFO [CommitProcessor:1:NIOServerCnxn@1579] - Established session 0x12c160c31fc000b with negotiated timeout 30000 for client /10.25.131.21:11191 2010-11-04 17:48:50,206 - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:17288:NIOServerCnxn@633] - EndOfStreamException: Unable to read additional data from client sessionid 0x12c160c31fc000b, likely client has closed socket 2010-11-04 17:48:50,207 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:17288:NIOServerCnxn@1434] - Closed socket connection for client /10.25.131.21:11191 which had sessionid 0x12c160c31fc000b 2010-11-04 17:48:50,207 - ERROR [CommitProcessor:1:NIOServerCnxn@444] - Unexpected Exception: java.nio.channels.CancelledKeyException at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:55) at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:59) at org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:417) at org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1508) at org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:367) at org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:73) |
47548 | No Perforce job exists for this issue. | 4 | 32810 | 9 years, 9 weeks, 1 day ago | 0|i05z93: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-917 | Leader election selected incorrect leader |
Bug | Resolved | Critical | Not A Problem | Unassigned | Alexandre Hardy | Alexandre Hardy | 03/Nov/10 08:33 | 18/Nov/11 20:01 | 04/Nov/10 08:40 | 3.2.2 | leaderElection, server | 0 | 1 | Cloudera distribution of zookeeper (patched to never cache DNS entries) Debian lenny |
We had three nodes running zookeeper: * 192.168.130.10 * 192.168.130.11 * 192.168.130.14 192.168.130.11 failed, and was replaced by a new node 192.168.130.13 (automated startup). The new node had not participated in any zookeeper quorum previously. The node 192.148.130.11 was permanently removed from service and could not contribute to the quorum any further (powered off). DNS entries were updated for the new node to allow all the zookeeper servers to find the new node. The new node 192.168.130.13 was selected as the LEADER, despite the fact that it had not seen the latest zxid. This particular problem has not been verified with later versions of zookeeper, and no attempt has been made to reproduce this problem as yet. |
214202 | No Perforce job exists for this issue. | 1 | 32811 | 9 years, 20 weeks, 2 days ago | 0|i05z9b: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-916 | Problem receiving messages from subscribed channels in c++ client |
Bug | Resolved | Major | Fixed | Ivan Kelly | Ivan Kelly | Ivan Kelly | 03/Nov/10 05:10 | 05/Nov/10 06:52 | 05/Nov/10 02:45 | contrib-hedwig | 0 | 1 | We see this bug with receiving messages from a subscribed channel. This problem seems to happen with larger messages. The flow is to first read at least 4 bytes from the socket channel. Extract the first 4 bytes to get the message size. If we've read enough data into the buffer already, we're done so invoke the messageReadCallbackHandler passing the channel and message size. If not, then do an async read for at least the remaining amount of bytes in the message from the socket channel. When done, invoke the messageReadCallbackHandler. The problem seems that when the second async read is done, the same sizeReadCallbackHandler is invoked instead of the messageReadCallbackHandler. The result is that we then try to read the first 4 bytes again from the buffer. This will get a random message size and screw things up. I'm not sure if it's an incorrect use of the boost asio async_read function or we're doing the boost bind to the callback function incorrectly. 101015 15:30:40.108 DEBUG hedwig.channel.cpp - DuplexChannel::sizeReadCallbackHandler system:0,512 channel(0x80b7a18) 101015 15:30:40.108 DEBUG hedwig.channel.cpp - DuplexChannel::sizeReadCallbackHandler: size of buffer before reading message size: 512 channel(0x80b7a18) 101015 15:30:40.108 DEBUG hedwig.channel.cpp - DuplexChannel::sizeReadCallbackHandler: size of incoming message 599, currently in buffer 508 channel(0x80b7a18) 101015 15:30:40.108 DEBUG hedwig.channel.cpp - DuplexChannel::sizeReadCallbackHandler: Still have more data to read, 91 from channel(0x80b7a18) 101015 15:30:40.108 DEBUG hedwig.channel.cpp - DuplexChannel::sizeReadCallbackHandler system:0, 91 channel(0x80b7a18) 101015 15:30:40.108 DEBUG hedwig.channel.cpp - DuplexChannel::sizeReadCallbackHandler: size of buffer before reading message size: 599 channel(0x80b7a18) 101015 15:30:40.108 DEBUG hedwig.channel.cpp - DuplexChannel::sizeReadCallbackHandler: size of incoming message 134287360, currently in buffer 595 channel(0x80b7a18) 101015 15:30:40.108 DEBUG hedwig.channel.cpp - DuplexChannel::sizeReadCallbackHandler: Still have more data to read, 134286765 from channel(0x80b7a18) |
47549 | No Perforce job exists for this issue. | 1 | 32812 | 9 years, 20 weeks, 6 days ago |
Reviewed
|
0|i05z9j: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-915 | Errors that happen during sync() processing at the leader do not get propagated back to the client. |
Bug | In Progress | Major | Unresolved | gaoshu | Benjamin Reed | Benjamin Reed | 28/Oct/10 18:43 | 05/Feb/20 07:11 | 3.7.0, 3.5.8 | 1 | 3 | 0 | 600 | ZOOKEEPER-907 | If an error in sync() processing happens at the leader (SESSION_MOVED for example), they are not propagated back to the client. | 100% | 100% | 600 | 0 | pull-request-available | 36650 | No Perforce job exists for this issue. | 0 | 32813 | 2 years, 29 weeks, 1 day ago | 0|i05z9r: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-914 | QuorumCnxManager blocks forever |
Bug | Resolved | Blocker | Duplicate | Vishal Kher | Vishal Kher | Vishal Kher | 27/Oct/10 15:54 | 12/Nov/10 17:47 | 12/Nov/10 17:47 | leaderElection | 0 | 1 | This was a disaster. While testing our application we ran into a scenario where a rebooted follower could not join the cluster. Further debugging showed that the follower could not join because the QuorumCnxManager on the leader was blocked for indefinite amount of time in receiveConnect() "Thread-3" prio=10 tid=0x00007fa920005800 nid=0x11bb runnable [0x00007fa9275ed000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.FileDispatcher.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:233) at sun.nio.ch.IOUtil.read(IOUtil.java:206) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:236) - locked <0x00007fa93315f988> (a java.lang.Object) at org.apache.zookeeper.server.quorum.QuorumCnxManager.receiveConnection(QuorumCnxManager.java:210) at org.apache.zookeeper.server.quorum.QuorumCnxManager$Listener.run(QuorumCnxManager.java:501) I had pointed out this bug along with several other problems in QuorumCnxManager earlier in https://issues.apache.org/jira/browse/ZOOKEEPER-900 and https://issues.apache.org/jira/browse/ZOOKEEPER-822. I forgot to patch this one as a part of ZOOKEEPER-822. I am working on a fix and a patch will be out soon. The problem is that QuorumCnxManager is using SocketChannel in blocking mode. It does a read() in receiveConnection() and a write() in initiateConnection(). Sorry, but this is really bad programming. Also, points out to lack of failure tests for QuorumCnxManager. |
214201 | No Perforce job exists for this issue. | 0 | 32814 | 9 years, 19 weeks, 6 days ago | 0|i05z9z: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-913 | Version parser fails to parse "3.3.2-dev" from build.xml. |
Bug | Closed | Critical | Fixed | Patrick D. Hunt | Anthony Urso | Anthony Urso | 26/Oct/10 02:50 | 23/Nov/11 14:22 | 27/Jan/11 02:45 | 3.3.1 | 3.3.3, 3.4.0 | build | 0 | 2 | Cannot build 3.3.1 from release tarball do to VerGen parser inability to parse "3.3.2-dev". version-info: [java] All version-related parameters must be valid integers! [java] Exception in thread "main" java.lang.NumberFormatException: For input string: "2-dev" [java] at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) [java] at java.lang.Integer.parseInt(Integer.java:481) [java] at java.lang.Integer.parseInt(Integer.java:514) [java] at org.apache.zookeeper.version.util.VerGen.main(VerGen.java:131) [java] Java Result: 1 |
47550 | No Perforce job exists for this issue. | 4 | 32815 | 9 years, 9 weeks ago | 0|i05za7: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-912 | ZooKeeper client logs trace and debug messages at level INFO |
Improvement | Patch Available | Minor | Unresolved | Michi Mutsuzaki | Anthony Urso | Anthony Urso | 26/Oct/10 02:42 | 05/Feb/20 07:11 | 3.3.1 | 3.7.0, 3.5.8 | java client | 1 | 2 | ZK logs a lot of uninformative trace and debug messages to level INFO. This fuzzes up everything and makes it easy to miss useful log info. | 70775 | No Perforce job exists for this issue. | 2 | 42101 | 3 years, 39 weeks, 2 days ago | 0|i07kkv: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-911 | move operations from methods to individual classes |
New Feature | Open | Major | Unresolved | Thomas Koch | Thomas Koch | Thomas Koch | 21/Oct/10 11:02 | 05/Feb/20 07:16 | 3.7.0, 3.5.8 | java client | 0 | 3 | ZOOKEEPER-837, ZOOKEEPER-965 | Copied from my email to the ZK dev list from 2010/05/26: For my current code I'm using zkclient[1] and have also looked at cages[2] for some ZK usage examples. I observed, that there's a common pattern to wrap ZK operations in callables and feed them to a "retryUntilConnected" executor. Now my idea is, that ZK should already come with operations in classes, e.g.: o.a.z.operation.Create extends Operation implements callable{ private path, data[], acl, createMode public Create( .. all kind of ctors .. ) public call(){ .. move code from Zookeeper.create() here } } Similiar classes should be provided for getChildren, delete, exists, getData, getACL, setACL and setData. One could then feed such operations to an ZkExecutor, which has the necessary knowledge about the ZkConnection and can execute a command either synchronously or asynchronously. One could also wrap operations in an ExceptionCatcher to ignore certain Exceptions or in a RetryPolicy. This is only an idea so far, but I wanted to share my thoughts before starting to try it out. (BTW: You can meet me at BerlinBuzzwords.de) [1] http://github.com/sgroschupf/zkclient [2] http://code.google.com/p/cages/ And a reply from Patrick Hunt at my mail: Hi Thomas, you might take a look at this JIRA https://issues.apache.org/jira/browse/ZOOKEEPER-679 there's definitely been interest in this area, however there are some real challenges as well. Most users do end up wrapping the basic api with some code, esp the "retry" metaphor is a common case, so I think it would be valuable. At the same time getting the semantics right is hard (and covering all the corner cases). Perhaps you could sync up with Aaron/Chris, I'd personally like to see this go into contrib, but I understand the extra burden the patch process presents -- it may make more sense to rapidly iterate on something like github and then move to contrib once you have something less frequently changing, where the patch issue would be less of a problem (see 679, there's discussion on this there). Regardless which way you take it we'd be happy to work with you. |
70777 | No Perforce job exists for this issue. | 1 | 42102 | 9 years, 5 weeks, 1 day ago | 0|i07kl3: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-910 | ZOOKEEPER-835 Use SelectionKey.isXYZ() methods instead of complicated binary logic |
Sub-task | Patch Available | Minor | Unresolved | Michi Mutsuzaki | Thomas Koch | Thomas Koch | 21/Oct/10 09:43 | 05/Feb/20 07:12 | 3.7.0, 3.5.8 | server | 0 | 0 | The SelectionKey class provides methods to replace something like this (k.readyOps() & (SelectionKey.OP_READ | SelectionKey.OP_WRITE)) != 0 with selectionKey.isReadable() || selectionKey.isWritable() It may be possible, that the first version saves a CPU cycle or two, but the later version saves developer brain cycles which are much more expensive. I suppose that there are many more places in the server code where this replacement could be done. I propose that whoever touches a code line like this should make the replacement. |
70743 | No Perforce job exists for this issue. | 1 | 42103 | 1 year, 45 weeks ago | 0|i07klb: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-909 | ZOOKEEPER-823 Extract NIO specific code from ClientCnxn |
Sub-task | Closed | Major | Fixed | Thomas Koch | Thomas Koch | Thomas Koch | 21/Oct/10 09:26 | 23/Nov/11 14:22 | 10/Nov/10 17:40 | 3.4.0 | java client | 0 | 1 | This patch is mostly the same patch as my last one for ZOOKEEPER-823 minus everything Netty related. This means this patch only extract all NIO specific code in the class ClientCnxnSocketNIO which extends ClientCnxnSocket. I've redone this patch from current trunk step by step now and couldn't find any logical error. I've already done a couple of successful test runs and will continue to do so this night. It would be nice, if we could apply this patch as soon as possible to trunk. This allows us to continue to work on the netty integration without blocking the ClientCnxn class. Adding Netty after this patch should be only a matter of adding the ClientCnxnSocketNetty class with the appropriate test cases. You could help me by reviewing the patch and by running it on whatever test server you have available. Please send me any complete failure log you should encounter to thomas at koch point ro. Thx! Update: Until now, I've collected 8 successful builds in a row! |
47551 | No Perforce job exists for this issue. | 6 | 33371 | 9 years, 20 weeks, 1 day ago |
Reviewed
|
netty | 0|i062pr: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-908 | ZOOKEEPER-835 Remove code duplication and inconsistent naming in ClientCnxn.Packet creation |
Sub-task | Closed | Minor | Fixed | Thomas Koch | Thomas Koch | Thomas Koch | 21/Oct/10 06:02 | 23/Nov/11 14:22 | 11/Nov/10 12:14 | 3.4.0 | server | 0 | 0 | rename record -> request (since their is a counterpart record named "response") rename header -> requestHeader (to distinguish from responseHeader) remove ByteBuffer creation code from primeConnection() method and use the duplicate code in the Packet constructor. Therefor the Bytebuffer bb parameter could also be removed from the constructor's parameters. |
47552 | No Perforce job exists for this issue. | 1 | 33372 | 9 years, 19 weeks, 6 days ago |
Reviewed
|
0|i062pz: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-907 | Spurious "KeeperErrorCode = Session moved" messages |
Bug | Closed | Blocker | Fixed | Vishal Kher | Vishal Kher | Vishal Kher | 20/Oct/10 14:27 | 23/Nov/11 14:22 | 04/Nov/10 12:29 | 3.3.1 | 3.3.2, 3.4.0 | 0 | 3 | ZOOKEEPER-915 | The sync request does not set the session owner in Request. As a result, the leader keeps printing: 2010-07-01 10:55:36,733 - INFO [ProcessThread:-1:PrepRequestProcessor@405] - Got user-level KeeperException when processing sessionid:0x298d3b1fa90000 type:sync: cxid:0x6 zxid:0xfffffffffffffffe txntype:unknown reqpath:/ Error Path:null Error:KeeperErrorCode = Session moved |
47553 | No Perforce job exists for this issue. | 2 | 32816 | 9 years, 20 weeks, 6 days ago |
Reviewed
|
0|i05zaf: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-906 | Improve C client connection reliability by making it sleep between reconnect attempts as in Java Client |
Improvement | Open | Major | Unresolved | Radu Marin | Radu Marin | Radu Marin | 19/Oct/10 21:45 | 05/Feb/20 07:16 | 3.3.1 | 3.7.0, 3.5.8 | c client | 0 | 3 | 86400 | 86400 | 0% | Currently, when a C client get disconnected, it retries a couple of hosts (not all) with no delay between attempts and then if it doesn't succeed it sleeps for 1/3 session expiration timeout period before trying again. In the worst case the disconnect event can occur after 2/3 of session expiration timeout has past, and sleeping for even more 1/3 session timeout will cause a session loss in most of the times. A better approach is to check all hosts but with random delay between reconnect attempts. Also the delay must be independent of session timeout so if we increase the session timeout we also increase the number of available attempts. This improvement covers the case when the C client experiences network problems for a short period of time and is not able to reach any zookeeper hosts. Java client already uses this logic and works very good. |
0% | 0% | 86400 | 86400 | 67887 | No Perforce job exists for this issue. | 1 | 42104 | 1 year, 7 weeks, 2 days ago | zookeeper c-client | 0|i07klj: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-905 | enhance zkServer.sh for easier zookeeper automation-izing |
Improvement | Closed | Minor | Fixed | Nicholas Harteau | Nicholas Harteau | Nicholas Harteau | 19/Oct/10 17:48 | 23/Nov/11 14:22 | 07/Dec/10 14:15 | 3.4.0 | scripts | 0 | 0 | zkServer.sh is good at starting zookeeper and figuring out the right options to pass along. unfortunately if you want to wrap zookeeper startup/shutdown in any significant way, you have to reimplement a bunch of the logic there. the attached patch addresses a couple simple issues: 1. add a 'start-foreground' option to zkServer.sh - this allows things that expect to manage a foregrounded process (daemontools, launchd, etc) to use zkServer.sh instead of rolling their own to launch zookeeper 2. add a 'print-cmd' option to zkServer.sh - rather than launching zookeeper from the script, just give me the command you'd normally use to exec zookeeper. I found this useful when writing automation to start/stop zookeeper as part of smoke testing zookeeper-based applications 3. Deal more gracefully with supplying alternate configuration files to zookeeper - currently the script assumes all config files reside in $ZOOCFGDIR - also useful for smoke testing 4. communicate extra info ("JMX enabled") about zookeeper on STDERR rather than STDOUT (necessary for #2) 5. fixes an issue on macos where readlink doesn't have the '-f' option. |
47554 | No Perforce job exists for this issue. | 1 | 33373 | 9 years, 16 weeks ago |
Reviewed
|
0|i062q7: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-904 | super digest is not actually acting as a full superuser |
Bug | Closed | Major | Fixed | Camille Fournier | Camille Fournier | Camille Fournier | 19/Oct/10 16:44 | 23/Nov/11 14:22 | 26/Oct/10 18:31 | 3.3.1 | 3.3.2, 3.4.0 | server | 0 | 2 | The documentation states: New in 3.2: Enables a ZooKeeper ensemble administrator to access the znode hierarchy as a "super" user. In particular no ACL checking occurs for a user authenticated as super. However, if a super user does something like: zk.setACL("/", Ids.READ_ACL_UNSAFE, -1); the super user is now bound by read-only ACL. This is not what I would expect to see given the documentation. It can be fixed by moving the chec for the "super" authId in PrepRequestProcessor.checkACL to before the for(ACL a : acl) loop. |
47555 | No Perforce job exists for this issue. | 2 | 32817 | 9 years, 22 weeks, 1 day ago |
Reviewed
|
0|i05zan: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-903 | Create a testing jar with useful classes from ZK test source |
Improvement | Resolved | Major | Implemented | Unassigned | Camille Fournier | Camille Fournier | 18/Oct/10 14:25 | 09/Oct/13 20:30 | 09/Oct/13 20:30 | tests | 0 | 0 | From mailing list: -----Original Message----- From: Benjamin Reed Sent: Monday, October 18, 2010 11:12 AM To: zookeeper-user@hadoop.apache.org Subject: Re: Testing zookeeper outside the source distribution? we should be exposing those classes and releasing them as a testing jar. do you want to open up a jira to track this issue? ben On 10/18/2010 05:17 AM, Anthony Urso wrote: > Anyone have any pointers on how to test against ZK outside of the > source distribution? All the fun classes (e.g. ClientBase) do not make > it into the ZK release jar. > > Right now I am manually running a ZK node for the unit tests to > connect to prior to running my test, but I would rather have something > that ant could reliably > automate starting and stopping for CI. > > Thanks, > Anthony |
36651 | No Perforce job exists for this issue. | 0 | 42105 | 6 years, 24 weeks, 1 day ago | 0|i07klr: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-902 | Fix findbug issue in trunk "Malicious code vulnerability" |
Bug | Closed | Minor | Fixed | Flavio Paiva Junqueira | Patrick D. Hunt | Patrick D. Hunt | 18/Oct/10 13:41 | 23/Nov/11 14:21 | 07/Feb/11 14:27 | 3.4.0 | 3.4.0 | quorum, server | 0 | 2 | ZOOKEEPER-900 | https://hudson.apache.org/hudson/view/ZooKeeper/job/ZooKeeper-trunk/970/artifact/trunk/findbugs/zookeeper-findbugs-report.html#Warnings_MALICIOUS_CODE Malicious code vulnerability Warnings Code Warning MS org.apache.zookeeper.server.quorum.LeaderElection.epochGen isn't final but should be |
47556 | No Perforce job exists for this issue. | 9 | 32818 | 9 years, 7 weeks, 3 days ago | 0|i05zav: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-901 | Redesign of QuorumCnxManager |
Improvement | Open | Major | Unresolved | Michael Han | Flavio Paiva Junqueira | Flavio Paiva Junqueira | 17/Oct/10 09:09 | 14/Dec/19 06:08 | 3.3.1 | 3.7.0 | leaderElection | 0 | 10 | ZOOKEEPER-2080, ZOOKEEPER-900 | QuorumCnxManager manages TCP connections between ZooKeeper servers for leader election in replicated mode. We have identified over time a couple of deficiencies that we would like to fix. Unfortunately, fixing these issues requires a little more than just generating a couple of small patches. More specifically, I propose, based on previous discussions with the community, that we reimplement QuorumCnxManager so that we achieve the following: # Establishing connections should not be a blocking operation, and perhaps even more important, it shouldn't prevent the establishment of connections with other servers; # Using a pair of threads per connection is a little messy, and we have seen issues over time due to the creation and destruction of such threads. A more reasonable approach is to have a single thread and a selector. |
70771 | No Perforce job exists for this issue. | 0 | 42106 | 2 years, 51 weeks ago | 0|i07klz: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-900 | FLE implementation should be improved to use non-blocking sockets |
Improvement | Open | Major | Unresolved | Martin Kuchta | Vishal Kher | Vishal Kher | 15/Oct/10 10:07 | 14/Dec/19 06:06 | 3.7.0 | 3 | 14 | ZOOKEEPER-932, ZOOKEEPER-933, ZOOKEEPER-934 | ZOOKEEPER-1678, ZOOKEEPER-2164, ZOOKEEPER-902, ZOOKEEPER-901 | From earlier email exchanges: 1. Blocking connects and accepts: a) The first problem is in manager.toSend(). This invokes connectOne(), which does a blocking connect. While testing, I changed the code so that connectOne() starts a new thread called AsyncConnct(). AsyncConnect.run() does a socketChannel.connect(). After starting AsyncConnect, connectOne starts a timer. connectOne continues with normal operations if the connection is established before the timer expires, otherwise, when the timer expires it interrupts AsyncConnect() thread and returns. In this way, I can have an upper bound on the amount of time we need to wait for connect to succeed. Of course, this was a quick fix for my testing. Ideally, we should use Selector to do non-blocking connects/accepts. I am planning to do that later once we at least have a quick fix for the problem and consensus from others for the real fix (this problem is big blocker for us). Note that it is OK to do blocking IO in SenderWorker and RecvWorker threads since they block IO to the respective peer. b) The blocking IO problem is not just restricted to connectOne(), but also in receiveConnection(). The Listener thread calls receiveConnection() for each incoming connection request. receiveConnection does blocking IO to get peer's info (s.read(msgBuffer)). Worse, it invokes connectOne() back to the peer that had sent the connection request. All of this is happening from the Listener. In short, if a peer fails after initiating a connection, the Listener thread won't be able to accept connections from other peers, because it would be stuck in read() or connetOne(). Also the code has an inherent cycle. initiateConnection() and receiveConnection() will have to be very carefully synchronized otherwise, we could run into deadlocks. This code is going to be difficult to maintain/modify. Also see: https://issues.apache.org/jira/browse/ZOOKEEPER-822 |
170 | No Perforce job exists for this issue. | 4 | 32819 | 1 year, 25 weeks, 6 days ago | 0|i05zb3: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-899 | Update Netty version in trunk to 3.2.2 |
Task | Resolved | Major | Fixed | Thomas Koch | Thomas Koch | Thomas Koch | 15/Oct/10 08:49 | 17/Sep/11 06:56 | 16/Sep/11 20:09 | 3.5.0 | build | 0 | 1 | ZOOKEEPER-1078 | The patch for ZOOKEEPER-823 already has netty version 3.2.1.Final while trunk has still 3.1.5.GA. Could you please update the netty version in trunk so that we can rule out the version difference a s a cause for the failures? Note that the most recent version of netty is already 3.2.2, not 3.2.1 as in ZOOKEEPER-823 - <dependency org="org.jboss.netty" name="netty" conf="default" rev="3.1.5.GA"> + <dependency org="org.jboss.netty" name="netty" conf="default" rev="3.2.1.Final"> |
34432 | No Perforce job exists for this issue. | 4 | 33374 | 8 years, 27 weeks, 5 days ago |
Reviewed
|
netty,maven | 0|i062qf: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-898 | C Client might not cleanup correctly during close |
Bug | Closed | Trivial | Fixed | Jared Cantwell | Jared Cantwell | Jared Cantwell | 14/Oct/10 15:35 | 23/Nov/11 14:22 | 28/Oct/10 14:51 | 3.3.2, 3.4.0 | c client | 0 | 1 | I was looking through the c-client code and noticed a situation where a counter can be incorrectly incremented and a small memory leak can occur. In zookeeper.c : add_completion(), if close_requested is true, then the completion will not be queued. But at the end, outstanding_sync is still incremented and free() never called on the newly allocated completion_list_t. I will submit for review a diff that I believe corrects this issue. |
47557 | No Perforce job exists for this issue. | 2 | 32820 | 9 years, 22 weeks ago |
Reviewed
|
0|i05zbb: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-897 | C Client seg faults during close |
Bug | Closed | Major | Fixed | Jared Cantwell | Jared Cantwell | Jared Cantwell | 14/Oct/10 15:26 | 23/Nov/11 14:22 | 28/Oct/10 12:25 | 3.3.2, 3.4.0 | c client | 0 | 1 | We observed a crash while closing our c client. It was in the do_io() thread that was processing as during the close() call. #0 queue_buffer (list=0x6bd4f8, b=0x0, add_to_front=0) at src/zookeeper.c:969 #1 0x000000000046234e in check_events (zh=0x6bd480, events=<value optimized out>) at src/zookeeper.c:1687 #2 0x0000000000462d74 in zookeeper_process (zh=0x6bd480, events=2) at src/zookeeper.c:1971 #3 0x0000000000469c34 in do_io (v=0x6bd480) at src/mt_adaptor.c:311 #4 0x00007ffff7bc59ca in start_thread () from /lib/libpthread.so.0 #5 0x00007ffff6f706fd in clone () from /lib/libc.so.6 #6 0x0000000000000000 in ?? () We tracked down the sequence of events, and the cause is that input_buffer is being freed from a thread other than the do_io thread that relies on it: 1. do_io() call check_events() 2. if(events&ZOOKEEPER_READ) branch executes 3. if (rc > 0) branch executes 4. if (zh->input_buffer != &zh->primer_buffer) branch executes .....in the meantime...... 5. zookeeper_close() called 6. if (inc_ref_counter(zh,0)!=0) branch executes 7. cleanup_bufs() is called 8. input_buffer is freed at the end ..... back to check_events()......... 9. queue_events() is called on a NULL buffer. I believe the patch is to only call free_completions() in zookeeper_close() and not cleanup_bufs(). The original reason cleanup_bufs() was added was to call any outstanding synhcronous completions, so only free_completions (which is guarded) is needed. I will submit a patch for review with this change. |
47558 | No Perforce job exists for this issue. | 2 | 32821 | 9 years, 22 weeks ago |
Reviewed
|
0|i05zbj: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-896 | Improve client to support dynamic authentication schemes |
Improvement | Patch Available | Major | Unresolved | Botond Hejj | Botond Hejj | Botond Hejj | 14/Oct/10 08:57 | 05/Feb/20 07:11 | 3.7.0, 3.5.8 | c client, java client | 1 | 6 | ZOOKEEPER-938 | When we started exploring zookeeper for our requirements we found the authentication mechanism is not flexible enough. We want to use kerberos for authentication but using the current API we ran into a few problems. The idea is that we get a kerberos token on the client side and than send that token to the server with a kerberos scheme. A server side authentication plugin can use that token to authenticate the client and also use the token for authorization. We ran into two problems with this approach: 1. A different kerberos token is needed for each different server that client can connect to since kerberos uses mutual authentication. That means when the client acquires this kerberos token it has to know which server it connects to and generate the token according to that. The client currently can't generate a token for a specific server. The token stored in the auth_info is used for all the servers. 2. The kerberos token might have an expiry time so if the client loses the connection to the server and than it tries to reconnect it should acquire a new token. That is not possible currently since the token is stored in auth_info and reused for every connection. The problem can be solved if we allow the client to register a callback for authentication instead a static token. This can be a callback with an argument which passes the current host string. The zookeeper client code could call this callback before it sends the authentication info to the server to get a fresh server specific token. This would solve our problem with the kerberos authentication and also could be used for other more dynamic authentication schemes. |
70787 | No Perforce job exists for this issue. | 6 | 42107 | 3 years, 31 weeks, 2 days ago | 0|i07km7: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-895 | ClientCnxn.authInfo must be thread safe |
Bug | Resolved | Major | Fixed | Unassigned | Thomas Koch | Thomas Koch | 14/Oct/10 03:25 | 19/Nov/10 12:40 | 19/Nov/10 12:40 | 0 | 1 | ZOOKEEPER-823 | authInfo can be accessed concurrently by different Threads, as exercised in org.apache.zookeeper.test.ACLTest The two concurrent access points in this case were (presumably): org.apache.zookeeper.ClientCnxn$SendThread.primeConnection(ClientCnxn.java:805) and org.apache.zookeeper.ClientCnxn.addAuthInfo(ClientCnxn.java:1121) The line numbers refer to the latest patch in ZOOKEEPER-823. The exception that pointed to this issue: [junit] 2010-10-13 09:35:55,113 [myid:] - WARN [main-SendThread(localhost:11221):ClientCnxn$SendThread@713] - Session 0x0 for server localhost/127.0.0.1:11221, unexpected error, closing socket connection and attempting reconnect [junit] java.util.ConcurrentModificationException [junit] at java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372) [junit] at java.util.AbstractList$Itr.next(AbstractList.java:343) [junit] at org.apache.zookeeper.ClientCnxn$SendThread.primeConnection(ClientCnxn.java:805) [junit] at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:247) [junit] at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:694) Proposed solution: Use a thread save list for authInfo |
47559 | No Perforce job exists for this issue. | 0 | 32822 | 9 years, 18 weeks, 6 days ago | 0|i05zbr: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-894 | ZOOKEEPER-835 add Package o.a.zookeeper.client |
Sub-task | Open | Major | Unresolved | Unassigned | Thomas Koch | Thomas Koch | 13/Oct/10 05:51 | 13/Oct/10 13:24 | 0 | 1 | I'd like to move classes that are not part of the API but belong to the ZK Client into a separate Client package. These classes are: - Inner classes that should become normal classes: Zookeeper.ZkWatchManager Zookeeper.WatchRegistration ClientCnxn.SendThread (should become a Runnable anyhow) ClientCnxn.EventThread ClientCnxn.Package ClientCnxn.AuthData ? - Classes now in the zookeeper package: ClientCnxn -> Client.Cnxn ClientCnxnSocket* -> Client.CnxnSocket* ... Maybe some others that can be moved without breaking the API - Classes yet to be written: PendingQueue ? OutgoingQueue ? |
36652 | No Perforce job exists for this issue. | 0 | 42108 | 9 years, 24 weeks, 1 day ago | 0|i07kmf: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-893 | ZooKeeper high cpu usage when invalid requests |
Bug | Closed | Critical | Fixed | Thijs Terlouw | Thijs Terlouw | Thijs Terlouw | 11/Oct/10 03:15 | 23/Nov/11 14:22 | 19/Oct/10 18:38 | 3.3.1 | 3.3.2, 3.4.0 | server | 0 | 3 | 3600 | 3600 | 0% | ZOOKEEPER-427 | Linux 2.6.16 4x Intel(R) Xeon(R) CPU X3320 @ 2.50GHz java version "1.6.0_17" Java(TM) SE Runtime Environment (build 1.6.0_17-b04) Java HotSpot(TM) Server VM (build 14.3-b01, mixed mode) |
When ZooKeeper receives certain illegally formed messages on the internal communication port (:4181 by default), it's possible for ZooKeeper to enter an infinite loop which causes 100% cpu usage. It's related to ZOOKEEPER-427, but that patch does not resolve all issues. from: src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java the two affected parts: =========== int length = msgLength.getInt(); if(length <= 0) { throw new IOException("Invalid packet length:" + length); } =========== =========== while (message.hasRemaining()) { temp_numbytes = channel.read(message); if(temp_numbytes < 0) { throw new IOException("Channel eof before end"); } numbytes += temp_numbytes; } =========== how to replicate this bug: perform an nmap portscan against your zookeeper server: "nmap -sV -n your.ip.here -p4181" wait for a while untill you see some messages in the logfile and then you will see 100% cpu usage. It does not recover from this situation. With my patch, it does not occur anymore |
0% | 0% | 3600 | 3600 | 47560 | No Perforce job exists for this issue. | 3 | 32823 | 9 years, 23 weeks, 1 day ago |
Reviewed
|
zookeeper server cpu ZOOKEEPER-427 | 0|i05zbz: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-892 | Remote replication of Zookeeper data |
New Feature | Open | Major | Unresolved | Anirban Roy | Anirban Roy | Anirban Roy | 08/Oct/10 06:38 | 14/Dec/19 06:08 | 3.4.0 | 3.7.0 | server | 15/May/11 | 3 | 18 | 9676800 | 9676800 | 0% | [root@llf531123 Zookeeper]# uname -a Linux llf531123.crawl.yahoo.net 2.6.9-67.0.22.ELsmp #1 SMP Fri Jul 11 10:37:57 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux [root@llf531123 Zookeeper]# java -version java version "1.6.0_03" Java(TM) SE Runtime Environment (build 1.6.0_03-b05) Java HotSpot(TM) 64-Bit Server VM (build 1.6.0_03-b05, mixed mode) [root@llf531123 Zookeeper]# |
ZooKeeper is a highly available and scalable system for distributed synchrony and is frequently used for cluster management. In its current incarnation it has issues with communication and data replication across extended geographic locations. Presently, the only way to distribute ZooKeeper across multiple data centers is to maintain a cross-colo Quorum using Observer members, leading to unnecessary consumption of bandwidth and performance impacts. As the title suggests, this work aims to to provide replication of ZooKeeper data from one site to others using a new type of ZooKeeper member called a Publisher. The broad idea is to have a complete instance of the current ZooKeeper at each geographic location in a master-slave setup. The Publisher will be a part of the Master ZooKeeper Site and will push changes to a FIFO queue and make it available to any interested client. The slave ZooKeeper runs client application called Replicator which receives and replays the changes to slave instance. Multiple slave Replicators can subscribes to the master Publisher and receive changes with guaranteed ordering. It will be asynchronous, non-intrusive, loosely-coupled and can be applied to a subset of the data. This scheme will bring about many of the benefits of database replication such as resilience to site failure and localized serving across data centers. In short, the goal is to provide remote (sub-tree) data replication with guaranteed ordering, without affecting the Master ZooKeeper performance. | 0% | 0% | 9676800 | 9676800 | 37 | No Perforce job exists for this issue. | 3 | 42109 | 6 years, 45 weeks, 2 days ago | ZOOKEEPER-892. Remote replication of ZooKeeper data (Anirban Roy) | zkrepl replication zoorepl | 0|i07kmn: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-891 | Allow non-numeric version strings |
Improvement | Closed | Minor | Duplicate | Unassigned | Eli Collins | Eli Collins | 07/Oct/10 21:38 | 23/Nov/11 14:22 | 09/Nov/10 18:17 | 3.4.0 | build | 0 | 0 | Non-numeric version strings (eg -dev) or -are not currently accepted, you either get an error (Invalid version number format, must be "x.y.z") or if you pass x.y.z-dev or x.y.z+1 you'll get a NumberFormatException. Would be useful to allow non-numeric versions. {noformat} version-info: [java] All version-related parameters must be valid integers! [java] Exception in thread "main" java.lang.NumberFormatException: For input string: "3-dev" [java] at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48) [java] at java.lang.Integer.parseInt(Integer.java:458) [java] at java.lang.Integer.parseInt(Integer.java:499) [java] at org.apache.zookeeper.version.util.VerGen.main(VerGen.java:131) [java] Java Result: 1 {noformat} |
214200 | No Perforce job exists for this issue. | 0 | 33375 | 9 years, 20 weeks, 2 days ago | 0|i062qn: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-890 | C client invokes watcher callbacks multiple times |
Bug | Resolved | Critical | Not A Problem | Unassigned | Austin Bennett | Austin Bennett | 07/Oct/10 03:53 | 13/Oct/10 13:11 | 13/Oct/10 13:11 | 3.3.1 | c client | 0 | 0 | ZOOKEEPER-888 | Mac OS X 10.6.5 | Code using the C client assumes that watcher callbacks are called exactly once. If the watcher is called more than once, the process will likely overwrite freed memory and/or crash. collect_session_watchers (zk_hashtable.c) gathers watchers from active_node_watchers, active_exist_watchers, and active_child_watchers without removing them. This results in watchers being invoked more than once. Test code is attached that reproduces the bug, along with a proposed patch. |
214199 | No Perforce job exists for this issue. | 2 | 32824 | 9 years, 24 weeks, 1 day ago | 0|i05zc7: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-889 | pyzoo_aget_children crashes due to incorrect watcher context |
Bug | Resolved | Critical | Fixed | Unassigned | Austin Bennett | Austin Bennett | 07/Oct/10 00:16 | 07/Oct/10 00:19 | 07/Oct/10 00:19 | 3.3.1 | contrib-bindings | 0 | 1 | OS X 10.6.5, Python 2.6.1 | The pyzoo_aget_children function passes the completion callback ("pyw") in place of the watcher callback ("get_pyw"). Since it is a one-shot callback, it is deallocated after the completion callback fires, causing a crash when the watcher callback should be invoked. |
47561 | No Perforce job exists for this issue. | 1 | 32825 | 9 years, 25 weeks ago | 0|i05zcf: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-888 | c-client / zkpython: Double free corruption on node watcher |
Bug | Closed | Critical | Fixed | Lukas | Lukas | Lukas | 06/Oct/10 09:26 | 23/Nov/11 14:22 | 19/Oct/10 15:02 | 3.3.1 | 3.3.2, 3.3.3, 3.4.0 | c client, contrib-bindings | 1 | 3 | ZOOKEEPER-890, ZOOKEEPER-740 | the c-client / zkpython wrapper invokes already freed watcher callback steps to reproduce: 0. start a zookeper server on your machine 1. run the attached python script 2. suspend the zookeeper server process (e.g. using `pkill -STOP -f org.apache.zookeeper.server.quorum.QuorumPeerMain` ) 3. wait until the connection and the node observer fired with a session event 4. resume the zookeeper server process (e.g. using `pkill -CONT -f org.apache.zookeeper.server.quorum.QuorumPeerMain` ) -> the client tries to dispatch the node observer function again, but it was already freed -> double free corruption |
47562 | No Perforce job exists for this issue. | 3 | 32826 | 9 years, 23 weeks, 1 day ago |
Reviewed
|
0|i05zcn: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-887 | Bug at - Producer-Consumer Example |
Bug | Open | Minor | Unresolved | Unassigned | sanjivsingh | sanjivsingh | 06/Oct/10 00:26 | 08/Sep/16 02:07 | java client | 1 | 2 | I tried to test Producer-Consumer Example published at ... http://hadoop.apache.org/zookeeper/docs/r3.0.0/zookeeperTutorial.html Queue.produce( int p) working correctly,,, there is problem in Queue.consume( ) method. int consume() throws KeeperException, InterruptedException{ int retvalue = -1; Stat stat = null; // Get the first element available while (true) { synchronized (mutex) { List<String> list = zk.getChildren(root, true); if (list.size() == 0) { System.out.println("Going to wait"); mutex.wait(); } else { Integer min = new Integer(list.get(0).substring(7)); for(String s : list){ Integer tempValue = new Integer(s.substring(7)); //System.out.println("Temporary value: " + tempValue); if(tempValue < min) min = tempValue; } System.out.println("Temporary value: " + root + "/element" + min); byte[] b = zk.getData(root + "/element" + min, false, stat); zk.delete(root + "/element" + min, 0); ByteBuffer buffer = ByteBuffer.wrap(b); retvalue = buffer.getInt(); return retvalue; } } } } wat exactly produce( ) doing is that add child under root like element000000001, element000000002 ,element000000003 etc.... but In consume( ) method , 1. Integer min = new Integer(list.get(0).substring(7)); 2. for(String s : list){ 3. Integer tempValue = new Integer(s.substring(7)); 4. if(tempValue < min) min = tempValue; 5. } 6. byte[] b = zk.getData(root + "/element" + min, false, stat); 7. zk.delete(root + "/element" + min, 0); bcuz of.. line 1 & 3 .. converting like String 000000001 ---------> Interger 1 and bcuz of this , in line 6 & 7 It is tring to access znode like at root + "/element1" rather than root + "/element000000001" that is definelty no-existing one.......... I m putting forward a solution.... int consume() throws KeeperException, InterruptedException{ int retvalue = -1; Stat stat = null; // Get the first element available while (true) { synchronized (mutex) { List<String> list = zk.getChildren(root, true); if (list.size() == 0) { System.out.println("Going to wait"); mutex.wait(); } else { Integer min = new Integer(list.get(0).substring(7)); int i=0 ,p=0; for(String s : list){ Integer tempValue = new Integer(s.substring(7)); if(tempValue < min) p=i; i++; } byte[] b = zk.getData(root + "/element" + list.get(p).substring(7), false, stat); zk.delete(root + "/element" + list.get(p).substring(7), 0); ByteBuffer buffer = ByteBuffer.wrap(b); retvalue = buffer.getInt(); return retvalue; } } } } } |
36653 | No Perforce job exists for this issue. | 1 | 32827 | 3 years, 28 weeks ago | 0|i05zcv: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-886 | Hedwig Server stays in "disconnected" state when connection to ZK dies but gets reconnected |
Bug | Resolved | Major | Fixed | Erwin Tam | Erwin Tam | Erwin Tam | 05/Oct/10 16:18 | 12/Oct/10 06:52 | 11/Oct/10 16:55 | contrib-hedwig | 0 | 1 | The Hedwig Server is connected to ZooKeeper. In the ZkTopicManager, it registers a watcher so that if it ever gets disconnected from ZK, it will temporarily fail all incoming requests since the Hedwig server does not know for sure if it is still the master for the topics. When the ZK client gets reconnected, the logic currently is wrong and it does not unset the suspended flag. Thus once it gets disconnected, it will stay in the suspended state forever, thereby making the Hedwig server hub dead. | 47563 | No Perforce job exists for this issue. | 1 | 32828 | 9 years, 24 weeks, 2 days ago |
Reviewed
|
0|i05zd3: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-885 | Zookeeper drops connections under moderate IO load |
Bug | Open | Major | Unresolved | Unassigned | Alexandre Hardy | Alexandre Hardy | 01/Oct/10 10:43 | 14/Dec/19 06:07 | 3.2.2, 3.3.1 | 3.7.0 | server | 4 | 18 | Debian (Lenny) 1Gb RAM swap disabled 100Mb heap for zookeeper |
A zookeeper server under minimum load, with a number of clients watching exactly one node will fail to maintain the connection when the machine is subjected to moderate IO load. In a specific test example we had three zookeeper servers running on dedicated machines with 45 clients connected, watching exactly one node. The clients would disconnect after moderate load was added to each of the zookeeper servers with the command: {noformat} dd if=/dev/urandom of=/dev/mapper/nimbula-test {noformat} The {{dd}} command transferred data at a rate of about 4Mb/s. The same thing happens with {noformat} dd if=/dev/zero of=/dev/mapper/nimbula-test {noformat} It seems strange that such a moderate load should cause instability in the connection. Very few other processes were running, the machines were setup to test the connection instability we have experienced. Clients performed no other read or mutation operations. Although the documents state that minimal competing IO load should present on the zookeeper server, it seems reasonable that moderate IO should not cause problems in this case. |
36654 | No Perforce job exists for this issue. | 5 | 32829 | 3 years, 50 weeks, 1 day ago | 0|i05zdb: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-884 | Remove LedgerSequence references from BookKeeper documentation and comments in tests |
Bug | Closed | Major | Fixed | Flavio Paiva Junqueira | Flavio Paiva Junqueira | Flavio Paiva Junqueira | 01/Oct/10 05:11 | 23/Nov/11 14:22 | 05/Nov/10 01:18 | 3.3.1 | 3.4.0 | contrib-bookkeeper | 0 | 1 | We no longer use LedgerSequence, so we need to remove references in documentation and comments sprinkled throughout the code. | 47564 | No Perforce job exists for this issue. | 1 | 32830 | 9 years, 20 weeks, 6 days ago |
Reviewed
|
0|i05zdj: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-883 | Idle cluster increasingly consumes CPU resources |
Bug | Resolved | Major | Implemented | Unassigned | Lars George | Lars George | 30/Sep/10 04:38 | 09/Oct/13 20:33 | 09/Oct/13 20:33 | 3.3.1 | server | 0 | 1 | ZOOKEEPER-934, ZOOKEEPER-880 | Monitoring the ZooKeeper nodes by polling the various ports using Nagios' open port checks seems to cause a substantial raise of CPU being used by the ZooKeeper daemons. Over the course of a week an idle cluster grew from a baseline 2% to >10% CPU usage. Attached is a stack dump and logs showing the occupied threads. At the end the daemon starts failing on "too many open files" errors as all handles are used up. | 36655 | No Perforce job exists for this issue. | 1 | 32831 | 6 years, 24 weeks, 1 day ago | 0|i05zdr: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-882 | Startup loads last transaction from snapshot |
Bug | Closed | Minor | Fixed | Jared Cantwell | Jared Cantwell | Jared Cantwell | 28/Sep/10 19:46 | 23/Nov/11 14:22 | 23/Dec/10 07:43 | 3.3.3, 3.4.0 | server | 0 | 1 | On startup, the server first loads the latest snapshot, and then loads from the log starting at the last transaction in the snapshot. It should begin from one past that last transaction in the log. I will attach a possible patch. | 47565 | No Perforce job exists for this issue. | 5 | 32832 | 9 years, 13 weeks, 6 days ago | 0|i05zdz: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-881 | ZooKeeperServer.loadData loads database twice |
Bug | Closed | Trivial | Fixed | Jared Cantwell | Jared Cantwell | Jared Cantwell | 28/Sep/10 19:41 | 23/Nov/11 14:21 | 18/Oct/10 14:30 | 3.3.2, 3.4.0 | server | 0 | 1 | zkDb.loadDataBase() is called twice at the beginning of loadData(). It shouldn't have any negative affects, but is unnecessary. A patch should be trivial. | 47566 | No Perforce job exists for this issue. | 1 | 32833 | 9 years, 23 weeks, 3 days ago |
Reviewed
|
0|i05ze7: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-880 | QuorumCnxManager$SendWorker grows without bounds |
Bug | Closed | Blocker | Fixed | Vishal Kher | Jean-Daniel Cryans | Jean-Daniel Cryans | 27/Sep/10 19:40 | 23/Nov/11 14:22 | 16/Mar/11 14:49 | 3.4.0 | 3.4.0 | 0 | 4 | ZOOKEEPER-934, ZOOKEEPER-883, ZOOKEEPER-939 | We're seeing an issue where one server in the ensemble has a steady growing number of QuorumCnxManager$SendWorker threads up to a point where the OS runs out of native threads, and at the same time we see a lot of exceptions in the logs. This is on 3.2.2 and our config looks like: {noformat} tickTime=3000 dataDir=/somewhere_thats_not_tmp clientPort=2181 initLimit=10 syncLimit=5 server.0=sv4borg9:2888:3888 server.1=sv4borg10:2888:3888 server.2=sv4borg11:2888:3888 server.3=sv4borg12:2888:3888 server.4=sv4borg13:2888:3888 {noformat} The issue is on the first server. I'm going to attach threads dumps and logs in moment. |
47567 | No Perforce job exists for this issue. | 9 | 32834 | 9 years, 1 week, 6 days ago |
Reviewed
|
0|i05zef: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-879 | ZOOKEEPER-835 outgoingQueue should be a class |
Sub-task | Open | Major | Unresolved | Unassigned | Thomas Koch | Thomas Koch | 23/Sep/10 04:52 | 23/Sep/10 12:08 | 0 | 0 | I'm not yet 100% sure about this yet, but it seems reasonable to me. Currently outgoingQueue is a simple list. Whether additional items can be added to the queue and the logic to add sth to the queue is handled by ClientCnxn. class OutgoingQueue - isOpen + add(Packet) / offer(Packet) + poll() / take() OutgoingQueue must have knowledge about the state of SendThreat and may only accept additional Packets if SendThread has not yet terminated. OutgoingQueue knows, when it must call ConnectionLoss on the remaining Packets in its queue. |
40361 | No Perforce job exists for this issue. | 0 | 42110 | 9 years, 27 weeks ago | 0|i07kmv: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-878 | ZOOKEEPER-835 finishPacket and conLossPacket should be methods of Packet |
Sub-task | Open | Minor | Unresolved | Thomas Koch | Thomas Koch | Thomas Koch | 23/Sep/10 04:36 | 05/Feb/20 07:16 | 3.7.0, 3.5.8 | server | 0 | 0 | Those methods change the inner state of Packet, work on Packet so they should better be methods of class Packet. This may help to clarify synchronization. | 70768 | No Perforce job exists for this issue. | 2 | 42111 | 9 years, 16 weeks, 2 days ago | 0|i07kn3: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-877 | zkpython does not work with python3.1 |
Bug | Closed | Major | Fixed | Daniel Enman | TuxRacer | TuxRacer | 22/Sep/10 06:44 | 13/Mar/14 14:16 | 08/Oct/13 02:46 | 3.3.1 | 3.4.6, 3.5.0 | contrib-bindings | 0 | 5 | linux+python3.1 | as written in the contrib/zkpython/README file: "Python >= 2.6 is required. We have tested against 2.6. We have not tested against 3.x." this is probably more a 'new feature' request than a bug; anyway compiling the pythn module and calling it returns an error at load time: python3.1 Python 3.1.2 (r312:79147, May 8 2010, 16:36:46) [GCC 4.4.4] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import zookeeper Traceback (most recent call last): File "<stdin>", line 1, in <module> ImportError: /usr/local/lib/python3.1/dist-packages/zookeeper.so: undefined symbol: PyString_AsString are there any plan to support Python3.X? I also tried to write a 3.1 ctypes wrapper but the C API seems in fact to be written in C++, so python ctypes cannot be used. |
70733 | No Perforce job exists for this issue. | 8 | 2586 | 6 years, 2 weeks ago |
Reviewed
|
0|i00sq7: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-876 | Unnecessary snapshot transfers between new leader and followers |
Bug | Resolved | Minor | Fixed | Diogo | Diogo | Diogo | 21/Sep/10 09:37 | 01/Jul/13 13:28 | 01/Jul/13 13:28 | 3.4.0 | 3.5.0 | 0 | 3 | ZOOKEEPER-874, ZOOKEEPER-1413 | When starting a new leadership, unnecessary snapshot transfers happen between new leader and followers. This is so because of multiple small bugs. 1) the comparison of zxids is done based on a new proposal, instead of the last logged zxid. (LearnerHandler.java ~ 297) 2) if follower is one zxid behind, the check of the interval of committed logs excludes the follower. (LearnerHandler.java ~ 277) 3) the bug reported in ZOOKEEPER-874 (commitLogs are empty after recover). |
67885 | No Perforce job exists for this issue. | 4 | 32835 | 6 years, 38 weeks, 3 days ago | 0|i05zen: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-875 | ResponderThread and udpSocket should be move from QuorumPeer to LeaderElection |
Improvement | Open | Trivial | Unresolved | Unassigned | Diogo | Diogo | 17/Sep/10 13:15 | 17/Sep/10 13:15 | 3.3.1 | leaderElection | 0 | 0 | Part of the algorithm implemented in the class LeaderElection is inside QuorumPeer. Is there any reason for that? ResponderThread and udpSocket belong to LeaderElection class and should be moved in LeaderElection.java. That would make the code look cleaner. | 50553 | No Perforce job exists for this issue. | 0 | 42112 | 9 years, 27 weeks, 6 days ago | 0|i07knb: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-874 | FileTxnSnapLog.restore does not call listener |
Bug | Closed | Trivial | Fixed | Diogo | Diogo | Diogo | 17/Sep/10 12:01 | 01/May/13 22:29 | 13/Apr/11 12:10 | 3.3.1 | 3.4.0 | leaderElection | 0 | 2 | ZOOKEEPER-876 | FileTxnSnapLog.restore() does not call listener passed as parameter. The result is that the commitLogs list is empty. When a follower connects to the leader, the leader is forced to send a snapshot to the follower instead of a couple of requests and commits. | 47568 | No Perforce job exists for this issue. | 1 | 32836 | 8 years, 50 weeks ago | leader election | 0|i05zev: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-873 | Performance oriented leader election (POLE) |
New Feature | Open | Minor | Unresolved | Unassigned | Diogo | Diogo | 15/Sep/10 11:57 | 25/Jul/12 14:46 | 1 | 1 | ZOOKEEPER-869 | Currently, the leader is elected based on the length of its history. In heterogeneous settings, other processes can be better suited to serve as a leader, e.g., the process running on the node with best links to a majority. POLE (Performance Oriented Leader Election) will be a leader election implementation that takes into account multiple factors when selecting the leader. |
50554 | No Perforce job exists for this issue. | 0 | 42113 | 7 years, 35 weeks, 1 day ago | leader election | 0|i07knj: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-872 | Small fixes to PurgeTxnLog |
Bug | Open | Minor | Unresolved | Vishal Kher | Vishal Kher | Vishal Kher | 14/Sep/10 21:51 | 05/Feb/20 07:16 | 3.3.1 | 3.7.0, 3.5.8 | 0 | 1 | PurgeTxnLog forces us to have at least 2 backups (by having count >= 3. Also, it prints to stdout instead of using Logger. | 38 | No Perforce job exists for this issue. | 2 | 32837 | 8 years, 13 weeks, 1 day ago | 0|i05zf3: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-871 | ClientTest testClientCleanup is failing due to high fd count. |
Bug | Resolved | Blocker | Cannot Reproduce | Unassigned | Mahadev Konar | Mahadev Konar | 14/Sep/10 18:39 | 08/Oct/13 18:55 | 08/Oct/13 18:55 | 0 | 0 | The fd counts has increased. The tests are repeatedly failing on hudson machines. I probably think this is related to netty server changes. We have to fix this before we release 3.4 | 70741 | No Perforce job exists for this issue. | 0 | 32838 | 6 years, 24 weeks, 2 days ago | 0|i05zfb: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-870 | Zookeeper trunk build broken. |
Bug | Closed | Major | Fixed | Mahadev Konar | Mahadev Konar | Mahadev Konar | 14/Sep/10 18:16 | 23/Nov/11 14:22 | 15/Sep/10 01:57 | 3.4.0 | 0 | 1 | the zookeeper current trunk build is broken mostly due to some netty changes. This is causing a huge backlog of PA's and other impediments to the review process. For now I plan to disable the test and fix them as part of 3.4 later. | 47569 | No Perforce job exists for this issue. | 2 | 32839 | 9 years, 28 weeks, 1 day ago |
Reviewed
|
0|i05zfj: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-869 | Support for election of leader with arbitrary zxid |
New Feature | Open | Minor | Unresolved | Unassigned | Diogo | Diogo | 14/Sep/10 03:39 | 21/Sep/10 09:44 | 0 | 0 | ZOOKEEPER-873 | Currently, the leader election algorithm implemented guarantees that the leader has the maximum zxid of the ensemble. The state synchronization after the election was built based on this assumption. However, other leader elections algorithms might elect leaders with arbitrary zxid. To support other leader election algorithms, the state synchronization should allow the leader to have an arbitrary zxid. |
214198 | No Perforce job exists for this issue. | 0 | 42114 | 9 years, 27 weeks, 2 days ago | leader election | 0|i07knr: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-868 | ZOOKEEPER-835 Cleanups from ZOOKEEPER-823 patch |
Sub-task | Open | Major | Unresolved | Unassigned | Ivan Kelly | Ivan Kelly | 08/Sep/10 13:19 | 08/Sep/10 13:23 | 0 | 0 | 214197 | No Perforce job exists for this issue. | 0 | 42115 | 9 years, 29 weeks, 1 day ago | 0|i07knz: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-867 | ClientTest is failing on hudson - fd cleanup |
Bug | Closed | Blocker | Fixed | Patrick D. Hunt | Patrick D. Hunt | Patrick D. Hunt | 07/Sep/10 03:29 | 23/Nov/11 14:22 | 14/Sep/10 19:22 | 3.4.0 | 3.3.2, 3.4.0 | tests | 0 | 1 | client cleanup test is failing on hudson. fd count is off. | 47570 | No Perforce job exists for this issue. | 1 | 32840 | 9 years, 28 weeks, 1 day ago | 0|i05zfr: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-866 | Adding no disk persistence option in zookeeper. |
New Feature | Open | Major | Unresolved | Mahadev Konar | Mahadev Konar | Mahadev Konar | 04/Sep/10 20:12 | 14/Dec/19 06:06 | 3.7.0 | 6 | 13 | ZOOKEEPER-1777 | Its been seen that some folks would like to use zookeeper for very fine grained locking. Also, in there use case they are fine with loosing all old zookeeper state if they reboot zookeeper or zookeeper goes down. The use case is more of a runtime locking wherein forgetting the state of locks is acceptable in case of a zookeeper reboot. Not logging to disk allows high throughput on and low latency on the writes to zookeeper. This would be a configuration option to set (ofcourse the default would be logging to disk). |
67211 | No Perforce job exists for this issue. | 1 | 42116 | 6 years, 20 weeks, 1 day ago | 0|i07ko7: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-865 | Runaway thread |
Bug | Open | Critical | Unresolved | Unassigned | Stephen McCants | Stephen McCants | 03/Sep/10 11:07 | 03/Sep/10 11:09 | 3.3.0, 3.3.1 | 0 | 2 | ZOOKEEPER-863 | Linux; Java 1.6; x86; | I'm starting a standalone Zookeeper server (v3.3.1). That starts normally and does not have a runaway thread. Next, I start an based Eclipse application that is using ZK 3.3.0 to register itself with the ZooKeeper server (3.3.1). The Eclipse application using the following arguments to Eclipse: -Dzoodiscovery.autoStart=true -Dzoodiscovery.flavor=zoodiscovery.flavor.centralized=smccants.austin.ibm.com When the Eclipse application starts, the ZK server prints out: 2010-09-03 09:59:46,006 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn$Factory@250] - Accepted socket connection from /9.53.189.11:42271 2010-09-03 09:59:46,039 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@776] - Client attempting to establish new session at /9.53.189.11:42271 2010-09-03 09:59:46,045 - INFO [SyncThread:0:NIOServerCnxn@1579] - Established session 0x12ad81b90000002 with negotiated timeout 4000 for client /9.53.189.11:42271 2010-09-03 09:59:46,046 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn$Factory@250] - Accepted socket connection from /9.53.189.11:42272 2010-09-03 09:59:46,078 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@776] - Client attempting to establish new session at /9.53.189.11:42272 2010-09-03 09:59:46,080 - INFO [SyncThread:0:NIOServerCnxn@1579] - Established session 0x12ad81b90000003 with negotiated timeout 4000 for client /9.53.189.11:42272 Then both the Eclipse application and the ZK server go into runaway states and consume 100% of the CPU. Here is a view from top: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 4949 smccants 15 0 597m 78m 5964 S 66.2 1.0 1:03.14 autosubmitter 4876 smccants 17 0 554m 27m 6688 S 30.9 0.3 0:34.74 java PID 4949 (autosubmitter) is the Eclipse application and is using more than twice the CPU of PID 4876 (java) which is the ZK server. They will continue in this state indefinitely. I can attach a debugger to the Eclipse application and if I stop the thread named "pool-1-thread-2-SendThread(smccants.austin.ibm.com:2181)" and the runaway condition stops on both the application and ZK server. However the ZK server reports: 2010-09-03 10:03:38,001 - INFO [SessionTracker:ZooKeeperServer@315] - Expiring session 0x12ad81b90000003, timeout of 4000ms exceeded 2010-09-03 10:03:38,002 - INFO [ProcessThread:-1:PrepRequestProcessor@208] - Processed session termination for sessionid: 0x12ad81b90000003 2010-09-03 10:03:38,005 - INFO [SyncThread:0:NIOServerCnxn@1434] - Closed socket connection for client /9.53.189.11:42272 which had sessionid 0x12ad81b90000003 Here is the stack trace from the suspended thread: EPollArrayWrapper.epollWait(long, int, long, int) line: not available [native method] EPollArrayWrapper.poll(long) line: 215 EPollSelectorImpl.doSelect(long) line: 77 EPollSelectorImpl(SelectorImpl).lockAndDoSelect(long) line: 69 EPollSelectorImpl(SelectorImpl).select(long) line: 80 ClientCnxn$SendThread.run() line: 1066 Any ideas what might be going wrong? Thanks. |
214196 | No Perforce job exists for this issue. | 0 | 32841 | 9 years, 29 weeks, 6 days ago | 0|i05zfz: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-864 | Hedwig C++ client improvements |
Improvement | Closed | Major | Fixed | Ivan Kelly | Ivan Kelly | Ivan Kelly | 03/Sep/10 10:42 | 23/Nov/11 14:22 | 11/Oct/10 15:01 | 3.4.0 | 0 | 1 | I changed the socket code to use boost asio. Now the client only creates one thread, and all operations are non-blocking. Tests are now automated, just run "make check". |
47571 | No Perforce job exists for this issue. | 5 | 33376 | 9 years, 24 weeks, 2 days ago |
Reviewed
|
0|i062qv: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-863 | Runaway thread - Zookeeper inside Eclipse |
Bug | Open | Critical | Unresolved | Unassigned | Stephen McCants | Stephen McCants | 03/Sep/10 10:33 | 03/Sep/10 14:52 | 3.3.0 | 0 | 2 | ZOOKEEPER-865 | Linux; x86 | I'm running Zookeeper inside an Eclipse application. When I launch the application from inside Eclipse I use the following arguments: -Dzoodiscovery.autoStart=true -Dzoodiscovery.flavor=zoodiscovery.flavor.centralized=localhost This causes the application to start its own ZooKeeper server inside the JVM/application. It immediately goes into a runaway state. The name of the runaway thread is "NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181". When I suspend this thread, the CPU usage returns to 0. Here is a stack trace from that thread when it is suspended: EPollArrayWrapper.epollWait(long, int, long, int) line: not available [native method] EPollArrayWrapper.poll(long) line: 215 EPollSelectorImpl.doSelect(long) line: 77 EPollSelectorImpl(SelectorImpl).lockAndDoSelect(long) line: 69 EPollSelectorImpl(SelectorImpl).select(long) line: 80 NIOServerCnxn$Factory.run() line: 232 Any ideas what might be going wrong? Thanks. |
214195 | No Perforce job exists for this issue. | 1 | 32842 | 9 years, 29 weeks, 6 days ago | 0|i05zg7: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-862 | Hedwig created ledgers with hardcoded Bookkeeper ensemble and quorum size. Make these a server config parameter instead. |
Improvement | Closed | Major | Fixed | Erwin Tam | Erwin Tam | Erwin Tam | 02/Sep/10 13:47 | 23/Nov/11 14:22 | 05/Nov/10 02:25 | 3.4.0 | contrib-hedwig | 0 | 1 | Hedwig code right now when using Bookkeeper as the persistence store is hardcoding the number of bookie servers in the ensemble and quorum size. This is used the first time a ledger is created. This should be exposed as a server configuration parameter instead. | 47572 | No Perforce job exists for this issue. | 1 | 33377 | 9 years, 20 weeks, 6 days ago |
Reviewed
|
0|i062r3: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-861 | Missing the test SSL certificate used for running junit tests. |
Bug | Closed | Minor | Fixed | Erwin Tam | Erwin Tam | Erwin Tam | 02/Sep/10 13:40 | 23/Nov/11 14:22 | 07/Sep/10 14:29 | 3.4.0 | contrib-hedwig | 0 | 2 | The Hedwig code checked into Apache is missing a test SSL certificate file used for running the server junit tests. We need this file otherwise the tests that use this (e.g. TestHedwigHub) will fail. | 47573 | No Perforce job exists for this issue. | 2 | 32843 | 9 years, 28 weeks, 1 day ago |
Reviewed
|
0|i05zgf: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-860 | Add alternative search-provider to ZK site |
Improvement | Open | Minor | Unresolved | Alex Baranau | Alex Baranau | Alex Baranau | 02/Sep/10 10:05 | 05/Feb/20 07:16 | 3.7.0, 3.5.8 | documentation | 1 | 3 | Use search-hadoop.com service to make available search in ZK sources, MLs, wiki, etc. This was initially proposed on user mailing list (http://search-hadoop.com/m/sTZ4Y1BVKWg1). The search service was already added in site's skin (common for all Hadoop related projects) before (as a part of [AVRO-626|https://issues.apache.org/jira/browse/AVRO-626]) so this issue is about enabling it for ZK. The ultimate goal is to use it at all Hadoop's sub-projects' sites. |
71222 | No Perforce job exists for this issue. | 1 | 42117 | 5 years, 47 weeks, 6 days ago | 0|i07kof: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-859 | Native Windows version of C client |
New Feature | Closed | Major | Duplicate | Ben Collins | Ben Collins | Ben Collins | 31/Aug/10 14:52 | 23/Nov/11 14:22 | 13/Jul/11 22:49 | 3.3.1 | 3.4.0 | c client | 0 | 1 | Windows 7, 64-bit | Use windows sockets and the win32 API for implementing the c client. This would be only useful for the "single-threaded" model, where the IO waiting is taken care of in the calling code. | 68110 | No Perforce job exists for this issue. | 3 | 33378 | 8 years, 37 weeks ago | 0|i062rb: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-858 | Zookeeper appears as QuorumPeerMain in jps output, which is not very user-friendly |
Improvement | Open | Major | Unresolved | Unassigned | Jeff Hammerbacher | Jeff Hammerbacher | 30/Aug/10 22:18 | 31/Aug/10 06:27 | 0 | 2 | As noted by Jordan Sissel on Twitter: http://twitter.com/jordansissel/status/22570450969 | 214194 | No Perforce job exists for this issue. | 0 | 42118 | 9 years, 30 weeks, 2 days ago | 0|i07kon: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-857 | clarify client vs. server view of session expiration event |
Bug | Open | Major | Unresolved | Unassigned | qing yan | qing yan | 30/Aug/10 22:14 | 05/Feb/20 07:15 | 3.7.0, 3.5.8 | documentation | 0 | 0 | Per mailing list discussion: <quote> the client only finds out about session expiration events when the client reconnects to the cluster. if zk tells a client that its session is expired, the ephemerals that correspond to that session will already be cleaned up. - deletion of an ephemeral file due to loss of client connection will occur after the client gets a connection loss - deletion of an ephemeral file will precede delivery of a session expiration event to the owner </quote> So session expirations means two things here : server view(ephemeral clean up) & client view(event delivery) , there are no guarantee how long it will take in between, correct? I guess the confusion rises from the documention which doesn't distinguish these two concepts, e.g. in the javadoc http://hadoop.apache.org/zookeeper/docs/r3.3.1/api/index.html An ephemeral node will be removed by the ZooKeeper automatically when the session associated with the creation of the node expires. It is actually refering to the server view not the client view. |
70752 | No Perforce job exists for this issue. | 0 | 32844 | 9 years, 30 weeks, 2 days ago | 0|i05zgn: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-856 | Connection imbalance leads to overloaded ZK instances |
Bug | Open | Major | Unresolved | Mahadev Konar | Travis Crawford | Travis Crawford | 26/Aug/10 15:10 | 05/Feb/20 07:16 | 3.7.0, 3.5.8 | 1 | 7 | We've experienced a number of issues lately where "ruok" requests would take upwards of 10 seconds to return, and ZooKeeper instances were extremely sluggish. The sluggish instance requires a restart to make it responsive again. I believe the issue is connections are very imbalanced, leading to certain instances having many thousands of connections, while other instances are largely idle. A potential solution is periodically disconnecting/reconnecting to balance connections over time; this seems fine because sessions should not be affected, and therefore ephemaral nodes and watches should not be affected. |
70739 | No Perforce job exists for this issue. | 2 | 32845 | 10 weeks ago | 0|i05zgv: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-855 | clientPortBindAddress should be clientPortAddress |
Bug | Closed | Trivial | Fixed | Jared Cantwell | Jared Cantwell | Jared Cantwell | 26/Aug/10 10:49 | 23/Nov/11 14:22 | 18/Oct/10 17:56 | 3.3.0, 3.3.1 | 3.3.2, 3.4.0 | documentation | 0 | 1 | The server documentation states that the configuration parameter for binding to a specific ip address is clientPortBindAddress. The code believes the parameter is clientPortAddress. The documentation for 3.3.X versions needs changed to reflect the correct parameter . This parameter was added in ZOOKEEPER-635. | 47574 | No Perforce job exists for this issue. | 2 | 32846 | 9 years, 23 weeks, 2 days ago | 0|i05zh3: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-854 | BookKeeper does not compile due to changes in the ZooKeeper code |
Bug | Closed | Major | Fixed | Flavio Paiva Junqueira | Flavio Paiva Junqueira | Flavio Paiva Junqueira | 19/Aug/10 16:04 | 23/Nov/11 14:22 | 28/Aug/10 12:11 | 3.3.1 | 3.4.0 | contrib-bookkeeper | 0 | 1 | BookKeeper does not compile due to changes in the NIOServerCnxn class of ZooKeeper. | 47575 | No Perforce job exists for this issue. | 2 | 32847 | 9 years, 28 weeks, 1 day ago |
Reviewed
|
0|i05zhb: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-853 | Make zookeeper.is_unrecoverable return True or False and not an integer |
Improvement | Closed | Minor | Fixed | Andrei Savu | Andrei Savu | Andrei Savu | 19/Aug/10 09:07 | 23/Nov/11 14:22 | 30/Aug/10 17:13 | 3.4.0 | contrib-bindings | 0 | 0 | This is a patch that fixes a TODO from the python zookeeper extension, it makes {{zookeeper.is_unrecoverable}} return {{True}} or {{False}} and not an integer. | 47576 | No Perforce job exists for this issue. | 2 | 33379 | 9 years, 28 weeks, 1 day ago | zookeeper.is_unrecoverable returns True or False |
Reviewed
|
0|i062rj: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-852 | Check path validation in C client |
Task | Open | Major | Unresolved | Unassigned | Thomas Koch | Thomas Koch | 17/Aug/10 04:42 | 05/Feb/20 07:16 | 3.7.0, 3.5.8 | c client | 0 | 0 | In ZOOKEEPER-849 we observed, that the validation code and the documentation of allowed characters is out of sync. Surely the validation is to permissive. The issue is fixed for the java client in ZOOKEEPER-849. As I'm not familiar with the C client code, I fill this separate issue in the hope that somebody may have a look at it. |
70749 | No Perforce job exists for this issue. | 0 | 42119 | 9 years, 32 weeks, 2 days ago | 0|i07kov: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-851 | ZK lets any node to become an observer |
Bug | Open | Critical | Unresolved | Unassigned | Vishal Kher | Vishal Kher | 16/Aug/10 10:12 | 14/Dec/19 06:08 | 3.3.1 | 3.7.0 | quorum, server | 0 | 6 | I had a 3 node cluster running. The zoo.cfg on each contained 3 entries as show below: tickTime=2000 dataDir=/var/zookeeper clientPort=2181 initLimit=5 syncLimit=2 server.0=10.150.27.61:2888:3888 server.1=10.150.27.62:2888:3888 server.2=10.150.27.63:2888:3888 I wanted to add another node to the cluster. In fourth node's zoo.cfg, I created another entry for that node and started zk server. The zoo.cfg on the first 3 nodes was left unchanged. The fourth node was able to join the cluster even though the 3 nodes had no idea about the fourth node. zoo.cfg on fourth node: tickTime=2000 dataDir=/var/zookeeper clientPort=2181 initLimit=5 syncLimit=2 server.0=10.150.27.61:2888:3888 server.1=10.150.27.62:2888:3888 server.2=10.150.27.63:2888:3888 server.3=10.17.117.71:2888:3888 It looks like 10.17.117.71 is becoming an observer in this case. I was expecting that the leader will reject 10.17.117.71. # telnet 10.17.117.71 2181 Trying 10.17.117.71... Connected to 10.17.117.71. Escape character is '^]'. stat Zookeeper version: 3.3.0--1, built on 04/02/2010 22:40 GMT Clients: /10.17.117.71:37297[1](queued=0,recved=1,sent=0) Latency min/avg/max: 0/0/0 Received: 3 Sent: 2 Outstanding: 0 Zxid: 0x200000065 Mode: follower Node count: 288 |
61797 | No Perforce job exists for this issue. | 1 | 42120 | 3 years, 38 weeks, 6 days ago | 0|i07kp3: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-850 | Switch from log4j to slf4j |
Improvement | Resolved | Major | Fixed | Olaf Krische | Olaf Krische | Olaf Krische | 16/Aug/10 08:49 | 24/Feb/20 20:21 | 23/Jan/12 19:25 | 3.3.1 | 3.4.0 | java client | 10 | 10 | ZOOKEEPER-1010 | SOLR-2369, ZOOKEEPER-3737, ZOOKEEPER-1371 | Hello, i would like to see slf4j integrated into the zookeeper instead of relying explicitly on log4j. slf4j is an abstract logging framework. There are adapters from slf4j to many logger implementations, one of them is log4j. The decision which log engine to use i dont like to make so early. This would help me to embed zookeeper in my own applications (which use a different logger implemenation, but slf4j is the basis) What do you think? (as i can see, those slf4j request flood all other projects on apache as well :-) Maybe for 3.4 or 4.0? I can offer a patchset, i have experience in such an migration already. :-) |
175 | No Perforce job exists for this issue. | 6 | 33380 | 6 years, 28 weeks, 2 days ago | * replaces log4j with slf4j code (also in contrib for bookkeeper, zooinspector,rest,loggraph), added slf4j dependencies into several ivy.xml files * you must add slf4j-api-1.6.1.jar and slf4j-log4j12-1.6.1.jar (bridge from sl4j to log4j) to the classpath, if not using the standard scripts * log4j remains as the final logger yet, there is still work to do: remove programmatic access to the log4j api from certain classes (which add appenders or configure log4j at runtime), or move them to contrib |
Reviewed
|
0|i062rr: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-849 | ZOOKEEPER-835 Provide Path class |
Sub-task | Open | Major | Unresolved | Thomas Koch | Thomas Koch | Thomas Koch | 16/Aug/10 08:28 | 14/Dec/19 06:08 | 3.7.0 | java client | 0 | 5 | ZOOKEEPER-12, ZOOKEEPER-616, ZOOKEEPER-324 | 39 | No Perforce job exists for this issue. | 7 | 42121 | 6 years, 2 weeks, 6 days ago | 0|i07kpb: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-848 | Implement the Failure Detector module in the C client |
Improvement | Open | Major | Unresolved | Unassigned | Abmar Barros | Abmar Barros | 16/Aug/10 00:11 | 24/Feb/11 23:24 | c client | 0 | 0 | The failure detector module https://issues.apache.org/jira/browse/ZOOKEEPER-702 is only used in the java client of ZooKeeper, once it reuses the implementation written in Java. The failure detectors must be written in C and the C client must be refactored to use them. | 214193 | No Perforce job exists for this issue. | 0 | 42122 | 9 years, 4 weeks, 6 days ago | failure detector C client | 0|i07kpj: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-847 | Missing acl check in zookeeper create |
Bug | Open | Major | Unresolved | Unassigned | Patrick Datko | Patrick Datko | 13/Aug/10 09:01 | 05/Feb/20 07:17 | 3.3.1, 3.3.2, 3.3.3 | 3.7.0, 3.5.8 | java client | 0 | 3 | I watched the source of the zookeeper class and I missed an acl check in the asynchronous version of the create operation. Is there any reason, that in the asynch version is no check whether the acl is valid, or did someone forget to implement it. It's interesting because we worked on a refactoring of the zookeeper client and don't want to implement a bug. The following code is missing: if (acl != null && acl.size() == 0) { throw new KeeperException.InvalidACLException(); } |
172 | No Perforce job exists for this issue. | 1 | 32848 | 8 years, 26 weeks, 2 days ago | acl-check | 0|i05zhj: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-846 | zookeeper client doesn't shut down cleanly on the close call |
Bug | Closed | Blocker | Fixed | Patrick D. Hunt | Ted Yu | Ted Yu | 12/Aug/10 13:19 | 23/Nov/11 14:22 | 22/Sep/10 02:39 | 3.2.2 | 3.3.2, 3.4.0 | java client | 0 | 3 | ZOOKEEPER-126, HBASE-2966 | Using HBase 0.20.6 (with HBASE-2473) we encountered a situation where Regionserver process was shutting down and seemed to hang. Here is the bottom of region server log: http://pastebin.com/YYawJ4jA zookeeper-3.2.2 is used. Here is relevant portion from jstack - I attempted to attach jstack twice in my email to dev@hbase.apache.org but failed: "DestroyJavaVM" prio=10 tid=0x00002aabb849c800 nid=0x6c60 waiting on condition [0x0000000000000000] java.lang.Thread.State: RUNNABLE "regionserver/10.32.42.245:60020" prio=10 tid=0x00002aabb84ce000 nid=0x6c81 in Object.wait() [0x0000000043755000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x00002aaab76633c0> (a org.apache.zookeeper.ClientCnxn$Packet) at java.lang.Object.wait(Object.java:485) at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1099) - locked <0x00002aaab76633c0> (a org.apache.zookeeper.ClientCnxn$Packet) at org.apache.zookeeper.ClientCnxn.close(ClientCnxn.java:1077) at org.apache.zookeeper.ZooKeeper.close(ZooKeeper.java:505) - locked <0x00002aaabf5e0c30> (a org.apache.zookeeper.ZooKeeper) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.close(ZooKeeperWrapper.java:681) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:654) at java.lang.Thread.run(Thread.java:619) "main-EventThread" daemon prio=10 tid=0x0000000043474000 nid=0x6c80 waiting on condition [0x00000000413f3000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00002aaabf6e9150> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:414) |
47577 | No Perforce job exists for this issue. | 2 | 32849 | 9 years, 27 weeks, 1 day ago |
Reviewed
|
0|i05zhr: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-845 | remove duplicate code from netty and nio ServerCnxn classes |
Improvement | Resolved | Major | Duplicate | Mohammad Arshad | Benjamin Reed | Benjamin Reed | 12/Aug/10 12:53 | 08/Aug/16 10:30 | 08/Aug/16 10:30 | 3.5.1 | server | 1 | 3 | ZOOKEEPER-2140, ZOOKEEPER-733 | the code for handling the 4-letter words is duplicated between the nio and netty versions of ServerCnxn. this makes maintenance problematic. | 70783 | No Perforce job exists for this issue. | 0 | 42123 | 3 years, 32 weeks, 3 days ago | 0|i07kpr: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-844 | handle auth failure in java client |
Bug | Closed | Major | Fixed | Camille Fournier | Camille Fournier | Camille Fournier | 12/Aug/10 12:05 | 23/Nov/11 14:22 | 06/Oct/10 12:19 | 3.3.1 | 3.3.2, 3.4.0 | java client | 0 | 1 | ClientCnxn.java currently has the following code: if (replyHdr.getXid() == -4) { // -2 is the xid for AuthPacket // TODO: process AuthPacket here if (LOG.isDebugEnabled()) { LOG.debug("Got auth sessionid:0x" + Long.toHexString(sessionId)); } return; } Auth failures appear to cause the server to disconnect but the client never gets a proper state change or notification that auth has failed, which makes handling this scenario very difficult as it causes the client to go into a loop of sending bad auth, getting disconnected, trying to reconnect, sending bad auth again, over and over. |
47578 | No Perforce job exists for this issue. | 2 | 32850 | 9 years, 25 weeks ago |
Reviewed
|
0|i05zhz: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-843 | ZOOKEEPER-835 Session class? |
Sub-task | Open | Major | Unresolved | Unassigned | Patrick Datko | Patrick Datko | 11/Aug/10 11:36 | 11/Aug/10 11:57 | 3.3.1 | java client | 0 | 1 | ZOOKEEPER-14 | Maybe it'd make sense to combine hostlist, sessionId, sessionPassword and sessionTimeout in a Session class so that the ctor of ClientCnxn won't get too long? |
214192 | No Perforce job exists for this issue. | 0 | 42124 | 9 years, 33 weeks, 1 day ago | session, session class, refactored class | 0|i07kpz: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-842 | ZOOKEEPER-835 stat calls static method on org.apache.zookeeper.server.DataTree |
Sub-task | Open | Major | Unresolved | Unassigned | Patrick Datko | Patrick Datko | 11/Aug/10 11:34 | 11/Aug/10 11:35 | 3.3.1 | java client | 0 | 1 | It's a huge jump from client code to the internal server class DataTree. Shouldn't there rather be some class related to the protobuffer stat class that knows how to copy a stat? |
214191 | No Perforce job exists for this issue. | 0 | 42125 | 9 years, 33 weeks, 1 day ago | DataTree, protobuffer | 0|i07kq7: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-841 | ZOOKEEPER-835 stat is returned by parameter |
Sub-task | Open | Major | Unresolved | Unassigned | Patrick Datko | Patrick Datko | 11/Aug/10 10:57 | 11/Aug/10 10:59 | 3.3.1 | java client | 0 | 1 | Since one can return only one value in java it's the only choice to do so. Still it feels more like C then like Java. However with operator classes one could simply get the result values with getter functions after the execution. |
214190 | No Perforce job exists for this issue. | 0 | 42126 | 9 years, 33 weeks, 1 day ago | 0|i07kqf: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-840 | ZOOKEEPER-835 massive code duplication in zookeeper class |
Sub-task | Open | Major | Unresolved | Thomas Koch | Patrick Datko | Patrick Datko | 11/Aug/10 10:56 | 16/Aug/10 08:20 | 0 | 1 | Each operation calls validatePath, handles the chroot, calls ClientCnxn and checks the return header for error. I'd like to address this with the operation classes: Each operation should receive a prechecked Path object. Calling ClientCnxn and error checking is not (or only partly) the concern of the operation but of an "executor" like class. |
214189 | No Perforce job exists for this issue. | 0 | 42127 | 9 years, 33 weeks, 1 day ago | code duplication | 0|i07kqn: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-839 | ZOOKEEPER-835 deleteRecursive does not belong to the other methods |
Sub-task | Closed | Blocker | Fixed | Mahadev Konar | Patrick Datko | Patrick Datko | 11/Aug/10 10:53 | 23/Nov/11 14:22 | 14/Aug/11 12:41 | 3.3.1 | 3.4.0 | java client | 0 | 0 | DeleteRecursive has been committed to trunk already as a method to the zookeeper class. So in the API it has the same level as the atomic operations create, delete, getData, setData, etc. The user must get the false impression, that deleteRecursive is also an atomic operation. It would be better to have deleteRecursive in some helper class but not that deep in zookeeper's core code. Maybe I'd like to have another policy on how to react if deleteRecursive fails in the middle of its work? |
47579 | No Perforce job exists for this issue. | 1 | 33381 | 8 years, 32 weeks, 3 days ago |
Reviewed
|
atomic operations | 0|i062rz: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-838 | ZOOKEEPER-835 Chroot is an attribute of ClientCnxn |
Sub-task | Open | Major | Unresolved | Unassigned | Patrick Datko | Patrick Datko | 11/Aug/10 10:50 | 21/Dec/10 15:59 | 0 | 1 | ZOOKEEPER-961 | It would be better to have one process that uses ZooKeeper for different things (managing a list of work, locking some unrelated locks elsewhere). So there are components that do this work inside the same process. These components should get the same zookeeper-client reference chroot'ed for their needs. So it'd be much better, if the ClientCnxn would not care about the chroot. |
214188 | No Perforce job exists for this issue. | 0 | 42128 | 9 years, 33 weeks ago | chroot | 0|i07kqv: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-837 | ZOOKEEPER-835 cyclic dependency ClientCnxn, ZooKeeper |
Sub-task | Open | Major | Unresolved | Thomas Koch | Patrick Datko | Patrick Datko | 11/Aug/10 10:47 | 05/Feb/20 07:16 | 3.3.1 | 3.7.0, 3.5.8 | java client | 0 | 2 | 0 | 20400 | ZOOKEEPER-666, ZOOKEEPER-911 | ZooKeeper instantiates ClientCnxn in its ctor with this and therefor builds a cyclic dependency graph between both objects. This means, you can't have the one without the other. So why did you bother do make them to separate classes in the first place? ClientCnxn accesses ZooKeeper.state. State should rather be a property of ClientCnxn. And ClientCnxn accesses zooKeeper.get???Watches() in its method primeConnection(). I've not yet checked, how this dependency should be resolved better. |
100% | 100% | 20400 | 0 | pull-request-available | 60545 | No Perforce job exists for this issue. | 5 | 42129 | 8 years, 35 weeks, 1 day ago | cyclic dependency | 0|i07kr3: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-836 | ZOOKEEPER-835 hostlist as string |
Sub-task | Resolved | Major | Fixed | Thomas Koch | Patrick Datko | Patrick Datko | 11/Aug/10 10:46 | 01/Dec/10 05:52 | 30/Nov/10 15:47 | 3.3.1 | java client | 0 | 1 | ZOOKEEPER-762, ZOOKEEPER-338, ZOOKEEPER-781, ZOOKEEPER-146 | The hostlist is parsed in the ctor of ClientCnxn. This violates the rule of not doing (too much) work in a ctor. Instead the ClientCnxn should receive an object of class "HostSet". HostSet could then be instantiated e.g. with a comma separated string. |
47580 | No Perforce job exists for this issue. | 5 | 33382 | 9 years, 17 weeks, 1 day ago |
Reviewed
|
hostliste, comma seperated | 0|i062s7: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-835 | Refactoring Zookeeper Client Code |
Improvement | Open | Major | Unresolved | Thomas Koch | Patrick Datko | Patrick Datko | 11/Aug/10 10:41 | 27/Dec/10 23:53 | 3.3.1 | java client | 0 | 3 | ZOOKEEPER-836, ZOOKEEPER-837, ZOOKEEPER-838, ZOOKEEPER-839, ZOOKEEPER-840, ZOOKEEPER-841, ZOOKEEPER-842, ZOOKEEPER-843, ZOOKEEPER-849, ZOOKEEPER-868, ZOOKEEPER-878, ZOOKEEPER-879, ZOOKEEPER-894, ZOOKEEPER-908, ZOOKEEPER-910, ZOOKEEPER-969, ZOOKEEPER-970 | ZOOKEEPER-666, ZOOKEEPER-794, ZOOKEEPER-22, ZOOKEEPER-823, ZOOKEEPER-277 | Thomas Koch asked me to fill individual issues for the points raised in his mail to zookeeper-dev: [Mail of Thomas Koch| http://mail-archives.apache.org/mod_mbox/hadoop-zookeeper-dev/201008.mbox/%3C201008111145.17507.thomas@koch.ro%3E ] He published several issues, which are present in the current zookeeper client, so a refactoring of the code would be an facility for other developers working with zookeeper. |
100% | 20400 | 0 | 214187 | No Perforce job exists for this issue. | 0 | 42130 | 9 years, 23 weeks ago | Zookeeper client code, refactoring, improvement, client, code | 0|i07krb: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-834 | Allow ephemeral znodes to have children created only by the owner session. |
New Feature | Resolved | Major | Duplicate | Rakesh Radhakrishnan | Andrei Savu | Andrei Savu | 06/Aug/10 15:58 | 20/Mar/19 11:28 | 20/Mar/19 11:28 | c client, java client, server | 3 | 12 | ZOOKEEPER-723, ZOOKEEPER-2163 | Ephemeral znodes are automatically removed when the client session is closed or expires and this behavior makes them very useful when you want to publish status information from active / connected clients. But there is a catch. Right now ephemerals can't have children znodes and because of that clients need to serialize status information as byte strings. This serialization renders that information almost invisible to generic zookeeper clients and hard / inefficient to update. Most of the time the status information can be expressed as a bunch of (key, value) pairs and we could easily store that using child znodes. Any ZooKeeper client can read that info without the need to reverse the serialization process and we can also easily update it. I suggest that the server should allow the ephemeral znodes to have children znodes. Each child should also be an ephemeral znode owned by the same session - parent ephemeralOwner session. Mail Archive: http://www.mail-archive.com/zookeeper-dev@hadoop.apache.org/msg09819.html Another discussion about the same topic: http://www.mail-archive.com/zookeeper-dev@hadoop.apache.org/msg08165.html |
container_znode_type | 40 | No Perforce job exists for this issue. | 4 | 2584 | 1 year, 1 day ago | 0|i00spr: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-833 | Attachments in the wiki do not work (so no presentations) |
Bug | Resolved | Major | Fixed | Thomas Koch | Bruce Mitchener | Bruce Mitchener | 06/Aug/10 15:17 | 05/Sep/11 15:55 | 05/Sep/11 15:55 | documentation | 0 | 1 | This is apparently a known issue: http://mail-archives.apache.org/mod_mbox/hadoop-zookeeper-user/201005.mbox/%3C562709E0-0516-481F-87AD-2039A564E5BD@yahoo-inc.com%3E None of the attachments on the Presentations page in the wiki work (nor does the link to the screenshot on the performance page). |
47581 | No Perforce job exists for this issue. | 0 | 32851 | 8 years, 29 weeks, 3 days ago | 0|i05zi7: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-832 | Invalid session id causes infinite loop during automatic reconnect |
Bug | Patch Available | Critical | Unresolved | Mohammad Arshad | Ryan Holmes | Ryan Holmes | 05/Aug/10 15:16 | 02/Jul/19 21:49 | 3.4.5, 3.5.0, 3.4.11 | server | 13 | 41 | ZOOKEEPER-1777 | All | Steps to reproduce: 1.) Connect to a standalone server using the Java client. 2.) Stop the server. 3.) Delete the contents of the data directory (i.e. the persisted session data). 4.) Start the server. The client now automatically tries to reconnect but the server refuses the connection because the session id is invalid. The client and server are now in an infinite loop of attempted and rejected connections. While this situation represents a catastrophic failure and the current behavior is not incorrect, it appears that there is no way to detect this situation on the client and therefore no way to recover. The suggested improvement is to send an event to the default watcher indicating that the current state is "session invalid", similar to how the "session expired" state is handled. Server log output (repeats indefinitely): 2010-08-05 11:48:08,283 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn$Factory@250] - Accepted socket connection from /127.0.0.1:63292 2010-08-05 11:48:08,284 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@751] - Refusing session request for client /127.0.0.1:63292 as it has seen zxid 0x44 our last zxid is 0x0 client must try another server 2010-08-05 11:48:08,284 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1434] - Closed socket connection for client /127.0.0.1:63292 (no session established for client) Client log output (repeats indefinitely): 11:47:17 org.apache.zookeeper.ClientCnxn startConnect INFO line 1000 - Opening socket connection to server localhost/127.0.0.1:2181 11:47:17 org.apache.zookeeper.ClientCnxn run WARN line 1120 - Session 0x12a3ae4e893000a for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1078) 11:47:17 org.apache.zookeeper.ClientCnxn cleanup DEBUG line 1167 - Ignoring exception during shutdown input java.nio.channels.ClosedChannelException at sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl.java:638) at sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:360) at org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:1164) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1129) 11:47:17 org.apache.zookeeper.ClientCnxn cleanup DEBUG line 1174 - Ignoring exception during shutdown output java.nio.channels.ClosedChannelException at sun.nio.ch.SocketChannelImpl.shutdownOutput(SocketChannelImpl.java:649) at sun.nio.ch.SocketAdaptor.shutdownOutput(SocketAdaptor.java:368) at org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:1171) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1129) |
63862 | No Perforce job exists for this issue. | 10 | 42131 | 37 weeks, 1 day ago | 0|i07krj: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-831 | BookKeeper: Throttling improved for reads |
Bug | Closed | Major | Fixed | Flavio Paiva Junqueira | Flavio Paiva Junqueira | Flavio Paiva Junqueira | 04/Aug/10 17:25 | 01/May/13 22:29 | 17/Sep/10 12:59 | 3.3.1 | 3.4.0 | contrib-bookkeeper | 0 | 2 | BOOKKEEPER-4 | Reads and writes in BookKeeper are asymmetric: a write request writes one entry, whereas a read request may read multiple requests. The current implementation of throttling only counts the number of read requests instead of counting the number of entries being read. Consequently, a few read requests reading a large number of entries each will spawn a large number of read-entry requests. | 47582 | No Perforce job exists for this issue. | 4 | 32852 | 9 years, 27 weeks, 5 days ago | 0|i05zif: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-830 | ZOOKEEPER-704 forrest docs for read-only mode |
Sub-task | Open | Major | Unresolved | Sergey Doroshenko | Sergey Doroshenko | Sergey Doroshenko | 02/Aug/10 15:36 | 05/Feb/16 12:38 | 0 | 1 | 214186 | No Perforce job exists for this issue. | 2 | 42132 | 4 years, 6 weeks, 6 days ago | 0|i07krr: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-829 | Add /zookeeper/sessions/* to allow inspection/manipulation of client sessions |
New Feature | Open | Major | Unresolved | Marshall McMullen | Todd Lipcon | Todd Lipcon | 29/Jul/10 13:25 | 13/Dec/12 03:09 | server | 1 | 12 | ZOOKEEPER-1587, HBASE-1316 | For some use cases in HBase (HBASE-1316 in particular) we'd like the ability to forcible expire someone else's ZK session. Patrick and I discussed on IRC and came up with an idea of creating nodes in /zookeeper/sessions/<session id> that can be read in order to get basic stats about a session, and written in order to manipulate one. The manipulation we need in HBase is the ability to write a command like "kill", but others might be useful as well. | 214185 | No Perforce job exists for this issue. | 2 | 42133 | 8 years, 4 weeks, 2 days ago | 0|i07krz: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-827 | ZOOKEEPER-704 enable r/o mode in C client library |
Sub-task | Resolved | Major | Fixed | Raúl Gutiérrez Segalés | Sergey Doroshenko | Sergey Doroshenko | 21/Jul/10 14:06 | 02/May/15 16:34 | 07/Jul/14 17:44 | 3.5.0 | 0 | 6 | ZOOKEEPER-2178 | Implement read-only mode functionality (in accordance with http://wiki.apache.org/hadoop/ZooKeeper/GSoCReadOnlyMode) in C client library | 214184 | No Perforce job exists for this issue. | 10 | 42134 | 4 years, 46 weeks, 5 days ago | 0|i07ks7: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-826 | cli.c should not call zoo_add_auth immediately after zookeeper_init() |
Bug | Open | Minor | Unresolved | Unassigned | Michi Mutsuzaki | Michi Mutsuzaki | 21/Jul/10 03:13 | 21/Jul/10 05:06 | 3.3.1 | c client | 0 | 0 | In cli.c, zoo_add_auth() gets called right after zookeeper_init(). Instead, zoo_add_auth() should be called in the callback after the connection is established. --Michi |
214183 | No Perforce job exists for this issue. | 0 | 32853 | 9 years, 36 weeks, 1 day ago | 0|i05zin: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-823 | update ZooKeeper java client to optionally use Netty for connections |
New Feature | Open | Major | Unresolved | Patrick D. Hunt | Patrick D. Hunt | Patrick D. Hunt | 19/Jul/10 17:57 | 05/Feb/20 07:17 | 3.7.0, 3.5.8 | java client | 2 | 10 | ZOOKEEPER-909 | ZOOKEEPER-895, ZOOKEEPER-1164, ZOOKEEPER-835, GIRAPH-211, ZOOKEEPER-733, ZOOKEEPER-1681, ZOOKEEPER-702 | This jira will port the client side connection code to use netty rather than direct nio. | 63346 | No Perforce job exists for this issue. | 18 | 42135 | 5 years, 29 weeks, 2 days ago | 0|i07ksf: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-822 | Leader election taking a long time to complete |
Bug | Closed | Blocker | Fixed | Vishal Kher | Vishal Kher | Vishal Kher | 19/Jul/10 11:51 | 23/Nov/11 14:22 | 06/Oct/10 13:03 | 3.3.0 | 3.3.2, 3.4.0 | quorum | 0 | 4 | Created a 3 node cluster. 1 Fail the ZK leader 2. Let leader election finish. Restart the leader and let it join the 3. Repeat After a few rounds leader election takes anywhere 25- 60 seconds to finish. Note- we didn't have any ZK clients and no new znodes were created. zoo.cfg is shown below: #Mon Jul 19 12:15:10 UTC 2010 server.1=192.168.4.12\:2888\:3888 server.0=192.168.4.11\:2888\:3888 clientPort=2181 dataDir=/var/zookeeper syncLimit=2 server.2=192.168.4.13\:2888\:3888 initLimit=5 tickTime=2000 I have attached logs from two nodes that took a long time to form the cluster after failing the leader. The leader was down anyways so logs from that node shouldn't matter. Look for "START HERE". Logs after that point should be of our interest. |
47583 | No Perforce job exists for this issue. | 17 | 32854 | 9 years, 25 weeks ago |
Reviewed
|
0|i05ziv: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-821 | Add ZooKeeper version information to zkpython |
Improvement | Closed | Trivial | Fixed | Rich Schumacher | Rich Schumacher | Rich Schumacher | 16/Jul/10 17:30 | 23/Nov/11 14:22 | 26/Jul/10 17:50 | 3.3.1 | 3.4.0 | contrib-bindings | 0 | 0 | Since installing and using ZooKeeper I've built and installed no less than four versions of the zkpython bindings. It would be really helpful if the module had a '__version__' attribute to easily tell which version is currently in use. | 47584 | No Perforce job exists for this issue. | 1 | 33383 | 9 years, 35 weeks, 2 days ago | Add a version number to zkpython releases. |
Reviewed
|
0|i062sf: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-820 | update c unit tests to ensure "zombie" java server processes don't cause failure |
Bug | Closed | Critical | Fixed | Michi Mutsuzaki | Patrick D. Hunt | Patrick D. Hunt | 16/Jul/10 12:00 | 23/Nov/11 14:22 | 20/Oct/10 14:46 | 3.3.1 | 3.3.2, 3.4.0 | 0 | 1 | When the c unit tests are run sometimes the server doesn't shutdown at the end of the test, this causes subsequent tests (hudson esp) to fail. 1) we should try harder to make the server shut down at the end of the test, I suspect this is related to test failing/cleanup 2) before the tests are run we should see if the old server is still running and try to shut it down |
47585 | No Perforce job exists for this issue. | 4 | 32855 | 9 years, 23 weeks, 1 day ago |
Reviewed
|
0|i05zj3: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-819 | ZOOKEEPER-816 build the checking tool |
Sub-task | Open | Minor | Unresolved | Unassigned | Miguel Correia | Miguel Correia | 16/Jul/10 11:03 | 16/Jul/10 11:03 | 0 | 0 | Building the checking tool is the hardest part of the project. It involves putting the traces together in a unified trace and checking if this unified trace shows that Zookeeper is satisfying a set of properties (e.g., a getData returns what was stored by the previous setData or create). | 214182 | No Perforce job exists for this issue. | 0 | 42136 | 9 years, 36 weeks, 6 days ago | 0|i07ksn: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-818 | ZOOKEEPER-816 improve the traces with additional information needed |
Sub-task | Open | Minor | Unresolved | Unassigned | Miguel Correia | Miguel Correia | 16/Jul/10 11:01 | 16/Jul/10 11:01 | 0 | 0 | The current traces do not include all the information we need to do the checking. The main additions would be to log the replies and hashes of values read/written. | 214181 | No Perforce job exists for this issue. | 0 | 42137 | 9 years, 36 weeks, 6 days ago | 0|i07ksv: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-817 | ZOOKEEPER-816 improve the efficiency of tracing |
Sub-task | Open | Minor | Unresolved | Unassigned | Miguel Correia | Miguel Correia | 16/Jul/10 11:00 | 16/Jul/10 11:00 | 0 | 0 | Zookeeper uses two kinds of logs, logs for information and debugging (the ones considered in this project) and transaction logs (need for Zab/Paxos to be fault tolerant); the latter are very efficient so the idea would be to make the first likewise using similar mechanisms. | 214180 | No Perforce job exists for this issue. | 0 | 42138 | 9 years, 36 weeks, 6 days ago | 0|i07kt3: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-816 | Detecting and diagnosing elusive bugs and faults in Zookeeper |
New Feature | Open | Minor | Unresolved | Unassigned | Miguel Correia | Miguel Correia | 16/Jul/10 10:55 | 20/Jul/10 16:48 | 0 | 1 | ZOOKEEPER-817, ZOOKEEPER-818, ZOOKEEPER-819 | Complex distributed systems like Zookeeper tend to fail in strange ways that are hard to diagnose. The objective is to build a tool that helps understand when and where these problems occurred based on Zookeeper's traces (i.e., logs in TRACE level). Minor changes to the server code will be needed. | 214179 | No Perforce job exists for this issue. | 0 | 42139 | 9 years, 36 weeks, 2 days ago | 0|i07ktb: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-815 | fill in "TBD"s in overview doc |
Bug | Open | Minor | Unresolved | Unassigned | Patrick D. Hunt | Patrick D. Hunt | 15/Jul/10 16:14 | 14/Dec/19 06:09 | 3.3.1 | 3.7.0 | documentation | 1 | 2 | ZOOKEEPER-2090 | Funny: "Ephemeral nodes are useful when you want to implement [tbd]." there are a few others in that doc that are should really be fixed. |
documentation | 70791 | No Perforce job exists for this issue. | 0 | 32856 | 9 years, 22 weeks ago | 0|i05zjb: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-814 | monitoring scripts are missing apache license headers |
Bug | Closed | Blocker | Fixed | Andrei Savu | Patrick D. Hunt | Patrick D. Hunt | 14/Jul/10 02:59 | 23/Nov/11 14:22 | 26/Jul/10 18:01 | 3.4.0 | contrib | 0 | 1 | Andrei, I just realized that src/contrib/monitoring files are missing apache license headers. Please add them (in particular any script files like python, see similar files in svn for examples - in some cases like README it's not strictly necessary.) You can run the RAT tool to verify (see build.xml or http://incubator.apache.org/rat/) |
47586 | No Perforce job exists for this issue. | 1 | 32857 | 9 years, 35 weeks, 2 days ago |
Reviewed
|
0|i05zjj: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-813 | maven install is broken due to incorrect organisation |
Bug | Closed | Critical | Duplicate | Jeff Hodges | Jeff Hodges | Jeff Hodges | 12/Jul/10 03:14 | 23/Nov/11 14:22 | 12/Jul/10 18:49 | 3.3.1 | 3.3.2, 3.4.0 | build | 0 | 0 | ZOOKEEPER-787 | SBT doesn't like the pom file for zookeeper because while it's under the "org.apache.hadoop" directory, it's organisation is actually "org.apache.zookeeper". A simple fix for this is to just change "org.apache.zookeeper" to "org.apache.hadoop". | 214178 | No Perforce job exists for this issue. | 0 | 32858 | 9 years, 37 weeks, 3 days ago | 0|i05zjr: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-812 | ZOOKEEPER-702 Failure Detector Model: Evaluate QoS metrics |
Sub-task | Open | Major | Unresolved | Abmar Barros | Abmar Barros | Abmar Barros | 12/Jul/10 03:03 | 12/Jul/10 03:03 | 0 | 0 | 214177 | No Perforce job exists for this issue. | 0 | 42140 | 9 years, 37 weeks, 3 days ago | 0|i07ktj: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-811 | ZOOKEEPER-702 Failure Detector Model: Refactor server to server monitoring |
Sub-task | Open | Major | Unresolved | Abmar Barros | Abmar Barros | Abmar Barros | 12/Jul/10 02:51 | 29/Jul/10 14:26 | 0 | 0 | Refactor server to server failure detection code to use the FailureDetector interface proposed in the parent JIRA. The failure detection method and its parameters should also be configurable in this case. Patches submitted in this JIRA use the latest patch of the parent JIRA as baseline. |
214176 | No Perforce job exists for this issue. | 1 | 42141 | 9 years, 35 weeks ago | 0|i07ktr: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-810 | ZOOKEEPER-702 Failure Detector Model: Write Forrest docs |
Sub-task | Open | Major | Unresolved | Abmar Barros | Abmar Barros | Abmar Barros | 12/Jul/10 02:44 | 12/Jul/10 02:44 | 0 | 0 | Write forrest docs about the Failure Detector Model implementation. This documentation should help one to understand how the failure detection model works on ZooKeeper, both on client and server sides. The usage and configuration of this feature should also be addressed in this documentation. |
214175 | No Perforce job exists for this issue. | 0 | 42142 | 9 years, 37 weeks, 3 days ago | failure detector forrest doc | 0|i07ktz: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-809 | Improved REST Interface |
Improvement | Closed | Major | Fixed | Andrei Savu | Andrei Savu | Andrei Savu | 10/Jul/10 11:18 | 23/Nov/11 14:22 | 17/Aug/10 03:26 | 3.4.0 | contrib | 0 | 0 | ZOOKEEPER-808, ZOOKEEPER-701 | I would like to extend the existing REST Interface to also support: * configuration * ephemeral znodes * watches - PubSubHubbub * ACLs * basic authentication I want to do this because when building web applications that talks directly to ZooKeeper a REST API it's a lot easier to use (there is no protocol mismatch) than an API that uses persistent connections. I plan to use the improved version to build a web-based administrative interface. |
47587 | No Perforce job exists for this issue. | 9 | 33384 | 9 years, 32 weeks, 2 days ago |
Reviewed
|
0|i062sn: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-808 | Web-based Administrative Interface |
New Feature | Closed | Major | Fixed | Andrei Savu | Andrei Savu | Andrei Savu | 10/Jul/10 10:38 | 23/Nov/11 14:22 | 18/Aug/10 01:53 | 3.4.0 | contrib | 0 | 0 | ZOOKEEPER-809, ZOOKEEPER-701 | Implement a web-based administrative interface that should allow the user to perform all the tasks that can be done using the interactive shell (zkCli.sh) from a browser. It should also display cluster and individual server info extracted using the 4letter word commands. I'm going to build starting from the http://github.com/phunt/zookeeper_dashboard implemented by Patrick Hunt. |
47588 | No Perforce job exists for this issue. | 1 | 33385 | 9 years, 28 weeks, 1 day ago |
Reviewed
|
web, interface, contrib | 0|i062sv: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-806 | Cluster management with Zookeeper - Norbert |
New Feature | Resolved | Major | Later | Unassigned | John Wang | John Wang | 07/Jul/10 10:44 | 22/Feb/13 00:48 | 22/Feb/13 00:48 | 0 | 1 | Hello, we have built a cluster management layer on top of Zookeeper here at the SNA team at LinkedIn: http://sna-projects.com/norbert/ We were wondering ways for collaboration as this is a very useful application of zookeeper. |
214174 | No Perforce job exists for this issue. | 0 | 42143 | 9 years, 37 weeks, 5 days ago | 0|i07ku7: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-805 | four letter words fail with latest ubuntu nc.openbsd |
Bug | Resolved | Critical | Fixed | Unassigned | Patrick D. Hunt | Patrick D. Hunt | 06/Jul/10 19:48 | 30/Apr/14 16:19 | 30/Apr/14 16:19 | 3.3.1, 3.4.0 | 3.4.6 | documentation, server | 0 | 3 | ZOOKEEPER-1197, ZOOKEEPER-737 | In both 3.3 branch and trunk "echo stat|nc localhost 2181" fails against the ZK server on Ubuntu Lucid Lynx. I noticed this after upgrading to lucid lynx - which is now shipping openbsd nc as the default: OpenBSD netcat (Debian patchlevel 1.89-3ubuntu2) vs nc traditional [v1.10-38] which works fine. Not sure if this is a bug in us or nc.openbsd, but it's currently not working for me. Ugh. |
71226 | No Perforce job exists for this issue. | 0 | 32859 | 5 years, 47 weeks, 1 day ago | 0|i05zjz: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-804 | c unit tests failing due to "assertion cptr failed" |
Bug | Closed | Critical | Fixed | Michi Mutsuzaki | Patrick D. Hunt | Patrick D. Hunt | 05/Jul/10 16:35 | 23/Nov/11 14:22 | 20/Oct/10 12:27 | 3.4.0 | 3.3.2, 3.4.0 | c client | 0 | 1 | ZOOKEEPER-707 | gcc 4.4.3, ubuntu lucid lynx, dual core laptop (intel) | I'm seeing this frequently: [exec] Zookeeper_simpleSystem::testPing : elapsed 18006 : OK [exec] Zookeeper_simpleSystem::testAcl : elapsed 1022 : OK [exec] Zookeeper_simpleSystem::testChroot : elapsed 3145 : OK [exec] Zookeeper_simpleSystem::testAuth ZooKeeper server started : elapsed 25687 : OK [exec] zktest-mt: /home/phunt/dev/workspace/gitzk/src/c/src/zookeeper.c:1952: zookeeper_process: Assertion `cptr' failed. [exec] make: *** [run-check] Aborted [exec] Zookeeper_simpleSystem::testHangingClient Mahadev can you take a look? |
47589 | No Perforce job exists for this issue. | 3 | 32860 | 9 years, 23 weeks, 1 day ago |
Reviewed
|
0|i05zk7: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-803 | Improve defenses against misbehaving clients |
Bug | Open | Major | Unresolved | Unassigned | Travis Crawford | Travis Crawford | 02/Jul/10 16:52 | 02/Jul/10 23:03 | 3.3.0 | 0 | 2 | ZOOKEEPER-801 | This issue is in response to ZOOKEEPER-801. Short version is a small number of buggy clients opened thousands of connections and caused Zookeeper to fail. The misbehaving client did not correctly handle expired sessions, creating a new connection each time. The huge number of connections exacerbated the issue. |
214173 | No Perforce job exists for this issue. | 1 | 32861 | 9 years, 38 weeks, 6 days ago | 0|i05zkf: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-802 | Improved LogGraph filters + documentation |
Improvement | Open | Minor | Unresolved | Ivan Kelly | Ivan Kelly | Ivan Kelly | 02/Jul/10 11:19 | 05/Feb/20 07:16 | 3.4.0 | 3.7.0, 3.5.8 | 0 | 0 | The log filtering mechanism has been improved and extended to work with message logs. Also, the documentation has been moved into the forrest documentation. | 70751 | No Perforce job exists for this issue. | 6 | 42144 | 6 years, 24 weeks ago | 0|i07kuf: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-801 | Zookeeper outage post-mortem |
Improvement | Resolved | Major | Not A Problem | Travis Crawford | Travis Crawford | Travis Crawford | 01/Jul/10 12:39 | 02/Jul/10 23:03 | 02/Jul/10 14:31 | 3.3.0 | 0 | 3 | ZOOKEEPER-803, ZOOKEEPER-517 | - RHEL5 2.6.18 kernel - Zookeeper 3.3.0 - ulimit raised to 65k files - 3 cluster members - 4-5k connections in steady-state - Primarily C and python clients, plus some java |
[Moving a thread from the zookeeper-user] RECOVERY We eventually recovered from this situation by shutting down clients. Initially I tried restarting the Zookeepers, however, they were getting hammered and I believe sessions timing out. I shut down ~2k clients (lightweight python app; simply sets one data watch and takes an action when it changes) at which point zktop could make a connection and a leader election was verified. After resetting latency stats the numbers were very good. I do not believe it would have ever recovered without shedding load. QUORUM/ELECTIONS DURING EVENT Unfortunately I do not have logs from the event :( We had debug logging on, and logrotate configured to keep 10 100MB files, and the interesting parts rotated away. I have already switched to info logging so we don't lose this data again. During the incident I was not able to view cluster status with zktop, and never observed a successful operation beyond connections, which quickly timed out. GC PAUSE/LOGGING This is a very good question. No, Zookeeper GC is not tuned and uses whatever the defaults are in the start scripts. GC logging is not enabled either. I filed an internal bug against myself to enable logging, and tune GC. CLIENT SESSION TIMEOUTS Clients are not explicitly setting timeouts, and I believe sessionTimeout is 10 seconds based on this log entry when initially connecting. 2010-07-01 05:14:00,260:44267(0x2af330240110):ZOO_INFO@zookeeper_init@727: Initiating client connection, host=10.209.21.133:2181,10.209.21.175:2181,10.209.21.181:2181 sessionTimeout=10000 watcher=(nil) sessionId=0 sessionPasswd=<null> context=(nil) flags=0 CLIENT BACKOFFS Looking in application logs, we see lots of the following: 2010-07-01 05:13:14,674:41491(0x41ebf940):ZOO_ERROR@handle_socket_error_msg@1528: Socket [10.209.21.181:2181] zk retcode=-7, errno=110(Connection timed out): connection timed out (exceeded timeout by 0ms) Doing some simple aggregations we see 130 errors in a ~13 minute sample period. This behavior on thousands of clients appears to have been a DDoS attack against Zookeeper. Using exponential behavior as the default behavior seems appropriate looking at this data. Fulltext of the client errors is attached. I grepped "errno" from a Python client log; I believe it uses the same underlying C library, so I did not include example output from a C program (though I can if needed). It looks basically the same. GOING FORWARD The long-GC pause causing clients to dogpile sounds like the most plausible explanation at this time. GC logging/tuning is clearly where I dropped the ball, just using the defaults; I don't think any changes should be made related to lack of tuning. Exponential backoffs does seem like a good idea, and generally useful for most people. There will always be service interruptions and backoffs would be a great preventive measure to get out of a dogpile situation. Patrick's message: """ Hi Travis, as Flavio suggested would be great to get the logs. A few questions: 1) how did you eventually recover, restart the zk servers? 2) was the cluster losing quorum during this time? leader re-election? 3) Any chance this could have been initially triggered by a long GC pause on one of the servers? (is gc logging turned on, any sort of heap monitoring?) Has the GC been tuned on the servers, for example CMS and incremental? 4) what are the clients using for timeout on the sessions? 3.4 probably not for a few months yet, but we are planning for a 3.3.2 in a few weeks to fix a couple critical issues (which don't seem related to what you saw). If we can identify the problem here we should be able to include it in any fix release we do. fixing something like 517 might help, but it's not clear how we got to this state in the first place. fixing 517 might not have any effect if the root cause is not addressed. 662 has only ever been reported once afaik, and we weren't able to identify the root cause for that one. One thing we might also consider is modifying the zk client lib to backoff connection attempts if they keep failing (timing out say). Today the clients are pretty aggressive on reconnection attempts. Having some sort of backoff (exponential?) would provide more breathing room to the server in this situation. Patrick """ Flavio's message: """ Hi Travis, Do you think it would be possible for you to open a jira and upload your logs? Thanks, -Flavio """ My initial message: """ Hey zookeepers - We just experienced a total zookeeper outage, and here's a quick post-mortem of the issue, and some questions about preventing it going forward. Quick overview of the setup: - RHEL5 2.6.18 kernel - Zookeeper 3.3.0 - ulimit raised to 65k files - 3 cluster members - 4-5k connections in steady-state - Primarily C and python clients, plus some java In chronological order, the issue manifested itself as alert about RW tests failing. Logs were full of too many files errors, and the output of netstat showed lots of CLOSE_WAIT and SYN_RECV sockets. CPU was 100%. Application logs showed lots of connection timeouts. This suggests an event happened that caused applications to dogpile on Zookeeper, and eventually the CLOSE_WAIT timeout caused file handles to run out and basically game over. I looked through lots of logs (clients+servers) and did not see a clear indication of what happened. Graphs show a sudden decrease in network traffic when the outage began, zookeeper goes cpu bound, and runs our of file descriptors. Clients are primarily a couple thousand C clients using default connection parameters, and a couple thousand python clients using default connection parameters. Digging through Jira we see two issues that probably contributed to this outage: https://issues.apache.org/jira/browse/ZOOKEEPER-662 https://issues.apache.org/jira/browse/ZOOKEEPER-517 Both are tagged for the 3.4.0 release. Anyone know if that's still the case, and when 3.4.0 is roughly scheduled to ship? Thanks! Travis """ |
214172 | No Perforce job exists for this issue. | 2 | 33386 | 9 years, 38 weeks, 6 days ago | zookeeper outage postmortem | 0|i062t3: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-800 | zoo_add_auth returns ZOK if zookeeper handle is in ZOO_CLOSED_STATE |
Bug | Closed | Minor | Fixed | Michi Mutsuzaki | Michi Mutsuzaki | Michi Mutsuzaki | 29/Jun/10 19:26 | 23/Nov/11 14:22 | 21/Oct/10 18:52 | 3.3.1 | 3.3.2, 3.4.0 | c client | 0 | 4 | This happened when I called zoo_add_auth() immediately after zookeeper_init(). It took me a while to figure out that authentication actually failed since zoo_add_auth() returned ZOK. It should return ZINVALIDSTATE instead. --Michi |
47590 | No Perforce job exists for this issue. | 1 | 32862 | 8 years, 40 weeks ago |
Reviewed
|
0|i05zkn: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-799 | Add tools and recipes for monitoring as a contrib |
New Feature | Closed | Major | Fixed | Andrei Savu | Andrei Savu | Andrei Savu | 29/Jun/10 17:13 | 17/Sep/12 09:21 | 14/Jul/10 02:41 | 3.4.0 | contrib | 0 | 2 | ZOOKEEPER-744, ZOOKEEPER-701 | Tools and Recipes for Monitoring ZooKeeper using Cacti, Nagios or Ganglia. | 47591 | No Perforce job exists for this issue. | 2 | 33387 | 7 years, 27 weeks, 3 days ago | Tools and Recipes for Monitoring ZooKeeper using Cacti, Nagios or Ganglia. |
Reviewed
|
monitoring, cacti, nagios, ganglia, contrib | 0|i062tb: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-798 | ZOOKEEPER-789 Fixup loggraph for FLE changes |
Sub-task | Closed | Minor | Fixed | Ivan Kelly | Ivan Kelly | Ivan Kelly | 29/Jun/10 09:01 | 23/Nov/11 14:22 | 05/Jul/10 15:59 | 3.4.0 | contrib | 0 | 0 | 47592 | No Perforce job exists for this issue. | 1 | 33388 | 9 years, 38 weeks, 2 days ago |
Reviewed
|
0|i062tj: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-797 | c client source with AI_ADDRCONFIG cannot be compiled with early glibc |
Improvement | Closed | Major | Fixed | Qian Ye | Qian Ye | Qian Ye | 29/Jun/10 00:06 | 23/Nov/11 14:22 | 05/Jul/10 16:33 | 3.3.1 | 3.4.0 | c client | 0 | 0 | linux 2.6.9 | c client source with AI_ADDRCONFIG cannot be compiled with early glibc (before 2.3.3) | 47593 | No Perforce job exists for this issue. | 1 | 33389 | 9 years, 38 weeks, 2 days ago |
Reviewed
|
c client | 0|i062tr: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-796 | zkServer.sh should support an external PIDFILE variable |
Bug | Closed | Major | Fixed | Alex Newman | Alex Newman | Alex Newman | 28/Jun/10 18:02 | 23/Nov/11 14:22 | 06/Jul/10 17:51 | 3.4.0 | scripts | 0 | 1 | So currently the pid file has to be tied to the datadirectory when starting zkServer.sh. It would be good to be able to break them up. | 47594 | No Perforce job exists for this issue. | 2 | 32863 | 9 years, 38 weeks, 1 day ago |
Reviewed
|
0|i05zkv: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-795 | eventThread isn't shutdown after a connection "session expired" event coming |
Bug | Closed | Blocker | Fixed | Sergey Doroshenko | mathieu barcikowski | mathieu barcikowski | 28/Jun/10 06:12 | 13/Jun/13 14:28 | 17/Aug/10 16:05 | 3.3.1 | 3.3.2, 3.4.0 | java client | 0 | 4 | DOSGI-191 | ubuntu 10.04 | Hi, I notice a problem with the eventThread located in ClientCnxn.java file. The eventThread isn't shutdown after a connection "session expired" event coming (i.e. never receive EventOfDeath). When a session timeout occurs and the session is marked as expired, the connexion is fully closed (socket, SendThread...) expect for the eventThread. As a result, if i create a new zookeeper object and connect through it, I got a zombi thread which will never be kill (as for the previous zookeeper object, the state is already close, calling close again don't do anything). So everytime I will create a new zookeeper connection after a expired session, I will have a one more zombi EventThread. How to reproduce : - Start a zookeeper client connection in debug mode - Pause the jvm enough time to the expired event occur - Watch for example with jvisualvm the list of threads, the sendThread is succesfully killed, but the EventThread go to wait state for a infinity of time - if you reopen a new zookeeper connection, and do again the previous steps, another EventThread will be present in infinite wait state |
47595 | No Perforce job exists for this issue. | 3 | 32864 | 9 years, 32 weeks, 2 days ago | 0|i05zl3: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-794 | Callbacks are not invoked when the client is closed |
Bug | Closed | Blocker | Fixed | Alexis Midon | Alexis Midon | Alexis Midon | 25/Jun/10 22:47 | 23/Nov/11 14:22 | 20/Oct/10 20:47 | 3.3.1 | 3.3.2, 3.4.0 | java client | 0 | 4 | ZOOKEEPER-954, ZOOKEEPER-835 | I noticed that ZooKeeper has different behaviors when calling synchronous or asynchronous actions on a closed ZooKeeper client. Actually a synchronous call will throw a "session expired" exception while an asynchronous call will do nothing. No exception, no callback invocation. Actually, even if the EventThread receives the Packet with the session expired err code, the packet is never processed since the thread has been killed by the ventOfDeath. So the call back is not invoked. |
47596 | No Perforce job exists for this issue. | 7 | 32865 | 9 years, 23 weeks ago |
Reviewed
|
0|i05zlb: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-793 | ZOOKEEPER-775 Large-scale Pub/Sub System (C++ Client) |
Sub-task | Resolved | Major | Fixed | Ivan Kelly | Ivan Kelly | Ivan Kelly | 25/Jun/10 10:39 | 05/May/11 07:59 | 05/May/11 07:59 | 0 | 1 | Write a c++ client for hedwig | 47597 | No Perforce job exists for this issue. | 1 | 33390 | 8 years, 47 weeks ago | 0|i062tz: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-792 | zkpython memory leak |
Bug | Closed | Major | Fixed | Lei Zhang | Lei Zhang | Lei Zhang | 24/Jun/10 17:36 | 23/Nov/11 14:22 | 22/Aug/10 22:59 | 3.3.1 | 3.3.2, 3.4.0 | contrib-bindings | 0 | 1 | vmware workstation - guest OS:Linux python:2.4.3 | We recently upgraded zookeeper from 3.2.1 to 3.3.1, now we are seeing less client deadlock on session expiration, which is a definite plus! Unfortunately we are seeing memory leak that requires our zk clients to be restarted every half-day. Valgrind result: ==8804== 25 (12 direct, 13 indirect) bytes in 1 blocks are definitely lost in loss record 255 of 670 ==8804== at 0x4021C42: calloc (vg_replace_malloc.c:418) ==8804== by 0x5047B42: parse_acls (zookeeper.c:369) ==8804== by 0x5047EF6: pyzoo_create (zookeeper.c:1009) ==8804== by 0x40786CC: PyCFunction_Call (in /usr/lib/libpython2.4.so.1.0) ==8804== by 0x40B31DC: PyEval_EvalFrame (in /usr/lib/libpython2.4.so.1.0) ==8804== by 0x40B4485: PyEval_EvalCodeEx (in /usr/lib/libpython2.4.so.1.0) |
47598 | No Perforce job exists for this issue. | 3 | 32866 | 9 years, 28 weeks, 1 day ago |
Reviewed
|
0|i05zlj: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-791 | Watches get triggered during client's reconnection |
Bug | Open | Minor | Unresolved | Unassigned | Sergey Doroshenko | Sergey Doroshenko | 24/Jun/10 08:20 | 24/Jun/10 08:22 | 0 | 1 | I start 2 of 3 servers of an ensemble, connect to it with zkCli.sh, do "ls / 1" which registers a watch. Then I kill one of 2 servers which makes alive one to lose a quorum and forces client to reconnect. And when the client connects to this alive server (but gets quickly dropped by the server afterwards), watch is triggered: WatchedEvent state:SyncConnected type:NodeChildrenChanged path:/ I can reproduce it only with command-line client, and quite rarely. I tried to write unit test, but id didn't catch this. Has anybody seen this before? |
214171 | No Perforce job exists for this issue. | 1 | 32867 | 9 years, 40 weeks ago | 0|i05zlr: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-790 | Last processed zxid set prematurely while establishing leadership |
Bug | Closed | Blocker | Fixed | Flavio Paiva Junqueira | Flavio Paiva Junqueira | Flavio Paiva Junqueira | 22/Jun/10 12:47 | 23/Nov/11 14:22 | 29/Jul/10 17:11 | 3.3.1 | 3.3.2, 3.4.0 | quorum | 0 | 3 | ZOOKEEPER-335 | The leader code is setting the last processed zxid to the first of the new epoch even before connecting to a quorum of followers. Because the leader code sets this value before connecting to a quorum of followers (Leader.java:281) and the follower code throws an IOException (Follower.java:73) if the leader epoch is smaller, we have that when the false leader drops leadership and becomes a follower, it finds a smaller epoch and kills itself. | 47599 | No Perforce job exists for this issue. | 14 | 32868 | 9 years, 34 weeks, 6 days ago |
Reviewed
|
0|i05zlz: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-789 | Improve FLE log messages |
Improvement | Closed | Major | Fixed | Flavio Paiva Junqueira | Flavio Paiva Junqueira | Flavio Paiva Junqueira | 18/Jun/10 17:01 | 23/Nov/11 14:22 | 05/Jul/10 15:53 | 3.3.1 | 3.3.2, 3.4.0 | 0 | 0 | ZOOKEEPER-798 | Notification messages are quite important to determine what is going with leader election. The main idea of this improvement is name the fields we output in notification log messages. | 47600 | No Perforce job exists for this issue. | 4 | 33391 | 9 years, 38 weeks, 2 days ago |
Reviewed
|
0|i062u7: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-788 | Add server id to message logs |
Improvement | Closed | Trivial | Fixed | Ivan Kelly | Ivan Kelly | Ivan Kelly | 17/Jun/10 12:45 | 23/Nov/11 14:21 | 25/Jun/10 16:11 | 3.3.1 | 3.4.0 | contrib | 0 | 0 | As discussed on IRC. The log visualisation needs some way of determining which server made which log. If the log segment is taken for a time period where no elections take place, there is no way to determine the id of the server. | 47601 | No Perforce job exists for this issue. | 1 | 33392 | 9 years, 39 weeks, 6 days ago |
Reviewed
|
0|i062uf: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-787 | groupId in deployed pom is wrong |
Bug | Closed | Blocker | Fixed | Unassigned | Chris Conrad | Chris Conrad | 10/Jun/10 12:46 | 23/Nov/11 14:22 | 15/Sep/10 11:39 | 3.3.1 | 3.3.2, 3.4.0 | 1 | 2 | ZOOKEEPER-813 | The pom deployed to repo1.maven.org has the project declared like this: <groupId>org.apache.zookeeper</groupId> <artifactId>zookeeper</artifactId> <packaging>jar</packaging> <version>3.3.1</version> But it is deployed here: http://repo2.maven.org/maven2/org/apache/hadoop/zookeeper/3.3.1 So either the groupId needs to change or the location it is deployed to needs to be changed because having them different results in bad behavior. If you specify the correct groupId in your own pom/ivy files you can't even download zookeeper because it's not where your pom says it is and if you use the "incorrect" groupId then you can download zookeeper but then ivy complains about: [error] :: problems summary :: [error] :::: ERRORS [error] public: bad organisation found in http://repo1.maven.org/maven2/org/apache/hadoop/zookeeper/3.3.1/zookeeper-3.3.1.pom: expected='org.apache.hadoop' found='org.apache.zookeeper' |
47602 | No Perforce job exists for this issue. | 0 | 32869 | 9 years, 28 weeks, 1 day ago |
Reviewed
|
0|i05zm7: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-786 | Exception in ZooKeeper.toString |
Bug | Resolved | Minor | Fixed | Thomas Koch | Stephen Green | Stephen Green | 04/Jun/10 17:25 | 16/Oct/11 21:00 | 16/Oct/11 21:00 | 3.3.1 | 3.5.0 | java client | 1 | 2 | Mac OS X, x86 | When trying to call ZooKeeper.toString during client disconnections, an exception can be generated: [04/06/10 15:39:57.744] ERROR Error while calling watcher java.lang.Error: java.net.SocketException: Socket operation on non-socket at sun.nio.ch.Net.localAddress(Net.java:128) at sun.nio.ch.SocketChannelImpl.localAddress(SocketChannelImpl.java:430) at sun.nio.ch.SocketAdaptor.getLocalAddress(SocketAdaptor.java:147) at java.net.Socket.getLocalSocketAddress(Socket.java:717) at org.apache.zookeeper.ClientCnxn.getLocalSocketAddress(ClientCnxn.java:227) at org.apache.zookeeper.ClientCnxn.toString(ClientCnxn.java:183) at java.lang.String.valueOf(String.java:2826) at java.lang.StringBuilder.append(StringBuilder.java:115) at org.apache.zookeeper.ZooKeeper.toString(ZooKeeper.java:1486) at java.util.Formatter$FormatSpecifier.printString(Formatter.java:2794) at java.util.Formatter$FormatSpecifier.print(Formatter.java:2677) at java.util.Formatter.format(Formatter.java:2433) at java.util.Formatter.format(Formatter.java:2367) at java.lang.String.format(String.java:2769) at com.echonest.cluster.ZooContainer.process(ZooContainer.java:544) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:488) Caused by: java.net.SocketException: Socket operation on non-socket at sun.nio.ch.Net.localInetAddress(Native Method) at sun.nio.ch.Net.localAddress(Net.java:125) ... 15 more |
19329 | No Perforce job exists for this issue. | 1 | 32870 | 8 years, 26 weeks ago |
Reviewed
|
0|i05zmf: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-785 | Zookeeper 3.3.1 shouldn't infinite loop if someone creates a server.0 line |
Bug | Closed | Major | Fixed | Patrick D. Hunt | Alex Newman | Alex Newman | 02/Jun/10 18:51 | 23/Nov/11 14:22 | 14/Sep/10 17:09 | 3.3.1 | 3.3.2, 3.4.0 | server | 0 | 1 | Tested in linux with a new jvm | The following config causes an infinite loop [zoo.cfg] tickTime=2000 dataDir=/var/zookeeper/ clientPort=2181 initLimit=10 syncLimit=5 server.0=localhost:2888:3888 Output: 2010-06-01 16:20:32,471 - INFO [main:QuorumPeerMain@119] - Starting quorum peer 2010-06-01 16:20:32,489 - INFO [main:NIOServerCnxn$Factory@143] - binding to port 0.0.0.0/0.0.0.0:2181 2010-06-01 16:20:32,504 - INFO [main:QuorumPeer@818] - tickTime set to 2000 2010-06-01 16:20:32,504 - INFO [main:QuorumPeer@829] - minSessionTimeout set to -1 2010-06-01 16:20:32,505 - INFO [main:QuorumPeer@840] - maxSessionTimeout set to -1 2010-06-01 16:20:32,505 - INFO [main:QuorumPeer@855] - initLimit set to 10 2010-06-01 16:20:32,526 - INFO [main:FileSnap@82] - Reading snapshot /var/zookeeper/version-2/snapshot.c 2010-06-01 16:20:32,547 - INFO [Thread-1:QuorumCnxManager$Listener@436] - My election bind port: 3888 2010-06-01 16:20:32,554 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:QuorumPeer@620] - LOOKING 2010-06-01 16:20:32,556 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:FastLeaderElection@649] - New election. My id = 0, Proposed zxid = 12 2010-06-01 16:20:32,558 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:FastLeaderElection@689] - Notification: 0, 12, 1, 0, LOOKING, LOOKING, 0 2010-06-01 16:20:32,560 - WARN [QuorumPeer:/0:0:0:0:0:0:0:0:2181:QuorumPeer@623] - Unexpected exception java.lang.NullPointerException at org.apache.zookeeper.server.quorum.FastLeaderElection.totalOrderPredicate(FastLeaderElection.java:496) at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:709) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:621) 2010-06-01 16:20:32,560 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:QuorumPeer@620] - LOOKING 2010-06-01 16:20:32,560 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:FastLeaderElection@649] - New election. My id = 0, Proposed zxid = 12 2010-06-01 16:20:32,561 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:FastLeaderElection@689] - Notification: 0, 12, 2, 0, LOOKING, LOOKING, 0 2010-06-01 16:20:32,561 - WARN [QuorumPeer:/0:0:0:0:0:0:0:0:2181:QuorumPeer@623] - Unexpected exception java.lang.NullPointerException at org.apache.zookeeper.server.quorum.FastLeaderElection.totalOrderPredicate(FastLeaderElection.java:496) at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:709) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:621) 2010-06-01 16:20:32,561 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:QuorumPeer@620] - LOOKING 2010-06-01 16:20:32,562 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:FastLeaderElection@649] - New election. My id = 0, Proposed zxid = 12 2010-06-01 16:20:32,562 - INFO [QuorumPeer:/0:0:0:0:0:0:0:0:2181:FastLeaderElection@689] - Notification: 0, 12, 3, 0, LOOKING, LOOKING, 0 2010-06-01 16:20:32,562 - WARN [QuorumPeer:/0:0:0:0:0:0:0:0:2181:QuorumPeer@623] - Unexpected exception java.lang.NullPointerException Things like HBase require that the zookeeper servers be listed in the zoo.cfg. This is a bug on their part, but zookeeper shouldn't null pointer in a loop though. |
47603 | No Perforce job exists for this issue. | 8 | 32871 | 9 years, 28 weeks, 1 day ago |
Reviewed
|
0|i05zmn: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-784 | ZOOKEEPER-704 server-side functionality for read-only mode |
Sub-task | Closed | Major | Fixed | Sergey Doroshenko | Sergey Doroshenko | Sergey Doroshenko | 02/Jun/10 18:16 | 11/Mar/14 04:35 | 19/May/11 19:45 | 3.4.0 | server | 0 | 4 | ZOOKEEPER-1349 | As per http://wiki.apache.org/hadoop/ZooKeeper/GSoCReadOnlyMode , create ReadOnlyZooKeeperServer which comes into play when peer is partitioned. | 47604 | No Perforce job exists for this issue. | 13 | 33393 | 8 years, 44 weeks, 5 days ago | 0|i062un: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-783 | committedLog in ZKDatabase is not properly synchronized |
Bug | Closed | Critical | Fixed | Henry Robinson | Henry Robinson | Henry Robinson | 01/Jun/10 15:22 | 23/Nov/11 14:22 | 26/Jul/10 18:45 | 3.3.1 | 3.3.2, 3.4.0 | server | 1 | 1 | ZKDatabase.getCommittedLog() returns a reference to the LinkedList<Proposal> committedLog in ZKDatabase. This is then iterated over by at least one caller. I have seen a bug that causes a NPE in LinkedList.clear on committedLog, which I am pretty sure is due to the lack of synchronization. This bug has not been apparent in normal ZK operation, but in code that I have that starts and stops a ZK server in process repeatedly (clear() is called from ZooKeeperServerMain.shutdown()). It's better style to defensively copy the list in getCommittedLog, and to synchronize on the list in ZKDatabase.clear. |
47605 | No Perforce job exists for this issue. | 1 | 32872 | 9 years, 35 weeks, 2 days ago |
Reviewed
|
0|i05zmv: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-782 | Incorrect C API documentation for Watches |
Bug | Closed | Trivial | Fixed | Mahadev Konar | Dave Wright | Dave Wright | 31/May/10 16:47 | 23/Nov/11 14:22 | 14/Jul/11 13:54 | 3.3.1 | 3.4.0 | c client, documentation | 0 | 2 | The C API Doxygen documentation states: " .... If the client is ever disconnected from the service, even if the disconnection is temporary, the watches of the client will be removed from the service, so a client must treat a disconnect notification as an implicit trigger of all outstanding watches." This is incorrect as of v.3. Watches are only lost and need to be re-registered when a session times out. When a normal disconnection occurs watches are reset automatically on reconnection. The documentation in zookeeper.h needs to be updated to correct this explanation. |
47606 | No Perforce job exists for this issue. | 1 | 32873 | 8 years, 36 weeks, 6 days ago | Corrected documentation on watch behavior in C API |
Reviewed
|
0|i05zn3: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-781 | provide a generalized "connection strategy" for ZooKeeper clients |
New Feature | Open | Major | Unresolved | Qian Ye | Patrick D. Hunt | Patrick D. Hunt | 26/May/10 14:00 | 05/Feb/20 07:16 | 3.7.0, 3.5.8 | c client, java client | 1 | 2 | ZOOKEEPER-836, ZOOKEEPER-779 | A connection strategy allows control over the way that ZooKeeper clients (we would implement this for both c and java apis) connect to a serving ensemble. Today we have two strategies, randomized round robin (default) and ordered round robin, both of which are hard coded into the client implementation. We would generalize this interface and allow users to create their own. See this page for more detail: http://wiki.apache.org/hadoop/ZooKeeper/ConnectionStrategy |
66777 | No Perforce job exists for this issue. | 7 | 42145 | 8 years, 35 weeks, 1 day ago | a draft patch for c client | 0|i07kun: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-780 | zkCli.sh generates a ArrayIndexOutOfBoundsException |
Bug | Resolved | Minor | Invalid | Unassigned | Miguel Correia | Miguel Correia | 25/May/10 06:22 | 24/Apr/14 19:52 | 24/Apr/14 19:52 | 3.3.1 | 3.5.0 | scripts | 0 | 3 | Linux Ubuntu running in VMPlayer on top of Windows XP | I'm starting to play with Zookeeper so I'm still running it in standalone mode. This is not a big issue, but here it goes for the records. I've run zkCli.sh to run some commands in the server. I created a znode /groups. When I tried to create a znode client_1 inside /groups, I forgot to include the data: an exception was generated and zkCli-sh crashed, instead of just showing an error. I tried a few variations and it seems like the problem is not including the data. A copy of the screen: [zk: localhost:2181(CONNECTED) 3] create /groups firstgroup Created /groups [zk: localhost:2181(CONNECTED) 4] create -e /groups/client_1 Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 3 at org.apache.zookeeper.ZooKeeperMain.processZKCmd(ZooKeeperMain.java:678) at org.apache.zookeeper.ZooKeeperMain.processCmd(ZooKeeperMain.java:581) at org.apache.zookeeper.ZooKeeperMain.executeLine(ZooKeeperMain.java:353) at org.apache.zookeeper.ZooKeeperMain.run(ZooKeeperMain.java:311) at org.apache.zookeeper.ZooKeeperMain.main(ZooKeeperMain.java:270) |
70800 | No Perforce job exists for this issue. | 3 | 32874 | 5 years, 48 weeks ago | If no data is provided for the new node when using the "create" zkCli.sh command assume an empty byte array. | 0|i05znb: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-779 | C Client should check the connectivity to the hosts in zookeeper_init |
Improvement | Open | Major | Unresolved | Unassigned | Qian Ye | Qian Ye | 22/May/10 23:00 | 26/May/10 14:00 | 3.3.1 | c client | 0 | 0 | ZOOKEEPER-781 | In some scenario, whether the client can connect to zookeeper servers is used as a logic condition. If the client cannot connect to the servers, the program should turn to another fork. However, current zookeeper_init could not tell whether the client can connect to one server or not. It could make some users feel confused. I think we should check the connectivity to the host in zookeeper_init, so we can tell whether the hosts are avaiable at that time or not. | 214170 | No Perforce job exists for this issue. | 2 | 42146 | 9 years, 44 weeks, 1 day ago | 0|i07kuv: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-778 | ability to add a watch on a setData or create call |
Improvement | Open | Minor | Unresolved | Unassigned | Woody Anderson | Woody Anderson | 22/May/10 16:22 | 15/Nov/19 19:56 | c client, java client, server | 1 | 2 | It is often desirable to set a watch when creating a node or setting data on a node. Currently, you have to add a watch after the create/set with another api call, which incurs extra cost, and a window of unobserved state change. This would "seem" to be an easy addition to the server/client libs, but i'm not sure if there are reasons this was never proposed or developed. I currently am most concerned with a data watch in these two scenarios, but i would imagine other users might be interested in registering a children watch immediately upon creation. This change would require adding new method signatures in the clients for create and setData which took watchers. And some changes to the protocol, as the SetDataRequest and CreateRequest objects would need watch flags. |
feature | 62683 | No Perforce job exists for this issue. | 0 | 42147 | 17 weeks, 5 days ago | 0|i07kv3: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-777 | setting acl on a non existant node should return no node error |
Bug | Resolved | Major | Invalid | Unassigned | Kapil Thangavelu | Kapil Thangavelu | 21/May/10 11:52 | 18/Nov/11 20:07 | 18/Nov/11 20:07 | 3.3.0, 3.3.1 | server | 0 | 0 | currently it just returns successfully, but the acl can't be retrieved, and if any value is being stored, its overwritten when the node is created. | 47607 | No Perforce job exists for this issue. | 1 | 32875 | 8 years, 18 weeks, 5 days ago | 0|i05znj: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-776 | API should sanity check sessionTimeout argument |
Improvement | Patch Available | Minor | Unresolved | Raúl Gutiérrez Segalés | Gregory Haskins | Gregory Haskins | 21/May/10 10:12 | 05/Feb/20 07:12 | 3.2.2, 3.3.0, 3.3.1, 3.4.6, 3.5.0 | 3.7.0, 3.5.8 | c client, java client | 0 | 3 | OSX 10.6.3, JVM 1.6.0-20 | passing in a "0" sessionTimeout to ZooKeeper() constructor leads to errors in subsequent operations. It would be ideal to capture this configuration error at the source by throwing something like an IllegalArgument exception when the bogus sessionTimeout is specified, instead of later when it is utilized. | 70786 | No Perforce job exists for this issue. | 4 | 42148 | 3 years, 39 weeks, 2 days ago | 0|i07kvb: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-775 | A large scale pub/sub system |
New Feature | Closed | Major | Fixed | Benjamin Reed | Benjamin Reed | Benjamin Reed | 18/May/10 00:21 | 23/Nov/11 14:22 | 19/Aug/10 17:29 | 3.4.0 | contrib | 0 | 15 | ZOOKEEPER-793 | we have developed a large scale pub/sub system based on ZooKeeper and BookKeeper. | 47608 | No Perforce job exists for this issue. | 8 | 33394 | 9 years, 28 weeks, 1 day ago | A pub sub system using BooKkeeper and ZooKeeper with C++ and Java client bindings. |
Reviewed
|
0|i062uv: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-774 | Recipes tests are slightly outdated: they do not compile against JUnit 4.8 |
Bug | Closed | Minor | Fixed | Sergey Doroshenko | Sergey Doroshenko | Sergey Doroshenko | 12/May/10 17:31 | 23/Nov/11 14:22 | 14/May/10 19:32 | 3.3.0 | 3.4.0 | recipes | 0 | 1 | As title | 47609 | No Perforce job exists for this issue. | 1 | 32876 | 9 years, 43 weeks, 2 days ago |
Reviewed
|
0|i05znr: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-773 | Log visualisation |
Improvement | Closed | Minor | Fixed | Ivan Kelly | Ivan Kelly | Ivan Kelly | 11/May/10 11:20 | 23/Nov/11 14:22 | 09/Jun/10 11:27 | 3.4.0 | contrib | 11/Oct/10 | 0 | 0 | Zkgraph is a log viewer for zookeeper. It can handle transaction logs and message logs. There are currently two view. a) Server view The server view shows the interactions between the different servers in an ensemble. The X axis represents time. * Exceptions show up as red dots. Hovering your mouse over them will give you more details of the exception * The colour of the line represents the election state of the server. - orange means LOOKING for leader - dark green means the server is the leader - light green means the server is following a leader - yellow means there isn't enough information to determine the state of the server. * The gray arrows denote election messages between servers. Pink dashed arrows are messages that were sent but never delivered. b) Session view The session view shows the lifetime of sessions on a server. Use the time filter to narrow down the view. Any more than about 2000 events will take a long time to view in your browser. The Y axis represents time in this case. Each line is a session. The black dots represent events on the session. You can click on the black dots for more details of the event. 2 - Compiling & Running Run "ant jar" in src/contrib/zkgraph/. This will download all dependencies and compile all the zkgraph code. Once compilation has finished, you can run it the the zkgraph.sh script in src/contrib/zkgraph/bin. This will start and embedded web server on you machine. Navigate to http://localhost:8182/graph/main.html. |
47610 | No Perforce job exists for this issue. | 3 | 33395 | 9 years, 42 weeks ago |
Reviewed
|
0|i062v3: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-772 | zkpython segfaults when watcher from async get children is invoked. |
Bug | Closed | Major | Fixed | Henry Robinson | Kapil Thangavelu | Kapil Thangavelu | 10/May/10 10:42 | 23/Nov/11 14:22 | 11/Aug/10 14:31 | 3.3.2, 3.4.0 | contrib-bindings | 0 | 1 | ubuntu lucid (10.04) / zk trunk | When utilizing the zkpython async get children api with a watch, i consistently get segfaults when the watcher is invoked to process events. | 47611 | No Perforce job exists for this issue. | 4 | 32877 | 9 years, 33 weeks, 1 day ago |
Reviewed
|
0|i05znz: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-771 | zkpython return without exception set on invalid auth scheme |
Bug | Open | Minor | Unresolved | Unassigned | Kapil Thangavelu | Kapil Thangavelu | 07/May/10 15:51 | 07/May/10 16:07 | contrib-bindings | 0 | 1 | ubuntu lucid | If you attempt to utilize an invalid auth scheme when adding authentication, you'll end up with an error return value in your callback. But the handle itself will be hosed, attempting to utilize it with any part of the api will return SystemError: error return without exception set |
214169 | No Perforce job exists for this issue. | 1 | 32878 | 9 years, 46 weeks, 6 days ago | 0|i05zo7: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-770 | Slow add_auth calls with multi-threaded client |
Bug | Patch Available | Major | Unresolved | Craig Calef | Kapil Thangavelu | Kapil Thangavelu | 06/May/10 15:50 | 05/Feb/20 07:11 | 3.3.0, 3.3.3, 3.4.0 | 3.7.0, 3.5.8 | c client, contrib-bindings | 1 | 8 | MESOS-4157, MESOS-4648 | ubuntu lucid (10.04), zk trunk (3.4) | Calls to add_auth are a bit slow from the c client library. The auth callback typically takes multiple seconds to fire. I instrumented the java, c binding, and python binding with a few log statements to find out where the slowness was occuring ( http://bazaar.launchpad.net/~hazmat/zookeeper/fast-auth-instrumented/revision/647). It looks like when the io thread polls, it doesn't register interest in the incoming packet, so the auth success message from the server and the auth callback are only processed when the poll timeouts. I tried modifying mt_adapter.c so the poll registers interest in both events, this causes a considerably more wakeups but it does address the issue of making add_auth fast. I think the ideal solution would be some sort of additional auth handshake state on the handle, that zookeeper_interest could utilize to suggest both POLLIN|POLLOUT are wanted for subsequent calls to poll during the auth handshake handle state. i'm attaching a script that takes 13s or 1.6s for the auth callback depending on the session time out value (which in turn figures into the calculation of the poll timeout). |
67869 | No Perforce job exists for this issue. | 5 | 32879 | 3 years, 39 weeks, 2 days ago | 0|i05zof: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-769 | Leader can treat observers as quorum members |
Bug | Closed | Major | Fixed | Sergey Doroshenko | Sergey Doroshenko | Sergey Doroshenko | 06/May/10 14:01 | 23/Nov/11 14:22 | 21/May/10 12:23 | 3.3.0 | 3.4.0 | 0 | 4 | ZOOKEEPER-704 | Ubuntu Karmic x64 | In short: it seems leader can treat observers as quorum members. Steps to repro: 1. Server configuration: 3 voters, 2 observers (attached). 2. Bring up 2 voters and one observer. It's enough for quorum. 3. Shut down the one from the quorum who is the follower. As I understand, expected result is that leader will start a new election round so that to regain quorum. But the real situation is that it just says goodbye to that follower, and is still operable. (When I'm shutting down 3rd one -- observer -- leader starts trying to regain a quorum). (Expectedly, if on step 3 we shut down the leader, not the follower, remaining follower starta new leader election, as it should be). |
47612 | No Perforce job exists for this issue. | 7 | 32880 | 9 years, 43 weeks, 2 days ago |
Reviewed
|
0|i05zon: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-768 | zkpython segfault on close (assertion error in io thread) |
Bug | Open | Major | Unresolved | Unassigned | Kapil Thangavelu | Kapil Thangavelu | 06/May/10 12:05 | 06/May/10 18:13 | 3.4.0 | contrib-bindings | 0 | 0 | ZOOKEEPER-707 | ubuntu lucid (10.04), zookeeper trunk (java/c/zkpython) | While trying to create a test case showing slow average add_auth, i stumbled upon a test case that reliably segfaults for me, albeit with variable amount of iterations (anwhere from 0 to 20 typically). fwiw, I've got about 220 processes in my test environment (ubuntu lucid 10.04). The test case opens a connection, adds authentication to it, and closes the connection, in a loop. I'm including the sample program and the gdb stack traces from the core file. I can upload the core file if thats helpful. | 214168 | No Perforce job exists for this issue. | 4 | 32881 | 9 years, 47 weeks ago | 0|i05zov: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-767 | Submitting Demo/Recipe Shared / Exclusive Lock Code |
Improvement | Resolved | Minor | Won't Fix | Sam Baskinger | Sam Baskinger | Sam Baskinger | 05/May/10 16:32 | 15/May/13 14:20 | 15/May/13 14:20 | 3.3.0 | 3.5.0 | recipes | 0 | 5 | 28800 | Networked Insights would like to share-back some code for shared/exclusive locking that we are using in our labs. | 100% | 100% | 28800 | 41 | No Perforce job exists for this issue. | 6 | 42149 | 6 years, 45 weeks, 1 day ago | New recipe code. | 0|i07kvj: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-766 | forrest recipes docs don't mention the lock/queue recipe implementations available in the release |
Bug | Closed | Minor | Fixed | Patrick D. Hunt | Patrick D. Hunt | Patrick D. Hunt | 05/May/10 15:04 | 23/Nov/11 14:22 | 05/May/10 18:51 | 3.3.1, 3.4.0 | documentation, recipes | 0 | 1 | Update the forrest recipes docs to point to the recipe implementations (where available). | 47613 | No Perforce job exists for this issue. | 1 | 32882 | 9 years, 47 weeks, 1 day ago |
Reviewed
|
0|i05zp3: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-765 | Add python example script |
Improvement | Closed | Minor | Fixed | Andrei Savu | Travis Crawford | Travis Crawford | 04/May/10 18:59 | 21/May/12 03:17 | 27/Jul/10 11:12 | 3.4.0 | contrib-bindings, documentation | 0 | 3 | ZOOKEEPER-395 | When adding some zookeeper-based functionality to a python script I had to figure everything out without guidance, which while doable, would have been a lot easier with an example. I extracted a skeleton program structure out with hopes its useful to others (maybe add as an example in the source or wiki?). This script does an aget() and sets a watch, and hopefully illustrates what's going on, and where to plug in your application code that gets run when the znode changes. There are probably some bugs, which if we fix now and provide a well-reviewed example hopefully others will not run into the same mistakes. |
47614 | No Perforce job exists for this issue. | 3 | 33396 | 9 years, 35 weeks, 1 day ago | A skeleton script that shows how to setup znode watches and how to react to events using the Python client libraries. |
Reviewed
|
0|i062vb: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-764 | Observer elected leader due to inconsistent voting view |
Bug | Closed | Major | Fixed | Henry Robinson | Flavio Paiva Junqueira | Flavio Paiva Junqueira | 04/May/10 17:41 | 23/Nov/11 14:22 | 05/May/10 18:27 | 3.3.1, 3.4.0 | quorum | 0 | 1 | ZOOKEEPER-690 | In ZOOKEEPER-690, we noticed that an observer was being elected, and Henry proposed a patch to fix the issue. However, it seems that the patch does not solve the issue one user (Alan Cabrera) has observed. Given that we would like to fix this issue, and to work separately with Alan to determine the problem with his setup, I'm creating this jira and re-posting Henry's patch. | 47615 | No Perforce job exists for this issue. | 2 | 32883 | 9 years, 47 weeks, 1 day ago |
Reviewed
|
0|i05zpb: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-763 | Deadlock on close w/ zkpython / c client |
Bug | Closed | Major | Fixed | Henry Robinson | Kapil Thangavelu | Kapil Thangavelu | 04/May/10 09:09 | 23/Nov/11 14:22 | 05/May/10 18:02 | 3.3.0 | 3.3.1, 3.4.0 | contrib-bindings | 0 | 1 | ubuntu 10.04, zookeeper 3.3.0 and trunk | deadlocks occur if we attempt to close a handle while there are any outstanding async requests (aget, acreate, etc). Normally on close both the io thread terminates and the completion thread are terminated and joined, however w\ith outstanding async requests, the completion thread won't be in a joinable state, and we effectively hang when the main thread does the join. afaics ideal behavior would be on close of a handle, to effectively clear out any remaining callbacks and let the completion thread terminate. i've tried adding some bookkeeping to within a python client to guard against closing while there is an outstanding async completion request, but its an imperfect solution since even after the python callback is executed there is still a window for deadlock before the completion thread finishes the callback. a simple example to reproduce the deadlock is attached. |
47616 | No Perforce job exists for this issue. | 5 | 32884 | 9 years, 47 weeks ago |
Reviewed
|
0|i05zpj: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-762 | ZOOKEEPER-107 Allow dynamic addition/removal of server nodes in the client API |
Sub-task | Resolved | Minor | Duplicate | Unassigned | Dave Wright | Dave Wright | 03/May/10 16:54 | 29/Dec/12 18:47 | 29/Dec/12 18:47 | 3.5.0 | c client, java client | 1 | 9 | ZOOKEEPER-836, ZOOKEEPER-1355 | Currently the list of zookeeper servers needs to be provided to the client APIs at construction time, and cannot be changed without a complete shutdown/restart of the client API. However, there are scenarios that require the server list to be updated, such as removal or addition of a ZK cluster node, and it would be nice if the list could be updated via a simple API call. The general approach (in the Java client) would be to "RemoveServer()/AddServer()" functions for Zookeeper that calls down to ClientCnxn, where they are just maintained in a list. Of course if the server being removed is the one currently connected, we'd need to disconnect, but a simple call to disconnect() seems like it would resolve that and trigger the automatic re-connection logic. An equivalent change could be made in the C code. This change would also make dynamic cluster membership in ZOOKEEPER-107 easier to implement. |
214167 | No Perforce job exists for this issue. | 1 | 42150 | 7 years, 12 weeks, 5 days ago | 0|i07kvr: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-761 | Remove *synchronous* calls from the *single-threaded* C clieant API, since they are documented not to work |
Improvement | Resolved | Blocker | Fixed | Benjamin Reed | Jozef Hatala | Jozef Hatala | 29/Apr/10 18:14 | 30/Jan/19 08:05 | 25/Mar/18 21:31 | 3.1.1, 3.2.2 | 3.5.3, 3.6.0 | c client | 0 | 7 | 0 | 600 | ZOOKEEPER-2640 | RHEL 4u8 (Linux). The issue is not OS-specific though. | Since the synchronous calls are [known|http://hadoop.apache.org/zookeeper/docs/current/zookeeperProgrammers.html#Using+the+C+Client] to be unimplemented in the single threaded version of the client library libzookeeper_st.so, I believe that it would be helpful towards users of the library if that information was also obvious from the header file. Anecdotally more than one of us here made the mistake of starting by using the synchronous calls with the single-threaded library, and we found ourselves debugging it. An early warning would have been greatly appreciated. 1. Could you please add warnings to the doxygen blocks of all synchronous calls saying that they are not available in the single-threaded API. This cannot be safely done with {{#ifdef THREADED}}, obviously, because the same header file is included whichever client library implementation one is compiling for. 2. Could you please bracket the implementation of all synchronous calls in zookeeper.c with {{#ifdef THREADED}} and {{#endif}}, so that those symbols are not present in libzookeeper_st.so? |
100% | 100% | 600 | 0 | pull-request-available | 67868 | No Perforce job exists for this issue. | 2 | 42151 | 1 year, 51 weeks, 3 days ago | Removed synchronous calls from the single-threaded API as they are not implemented and documented as such. | 0|i07kvz: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-760 | Improved string encoding and decoding performance |
Improvement | Open | Major | Unresolved | Unassigned | Patrick D. Hunt | Patrick D. Hunt | 29/Apr/10 14:46 | 05/Feb/20 07:16 | 3.7.0, 3.5.8 | java client, server | 0 | 1 | Our marshaling code converts strings to utf8 bytes, this can be optimized, see: https://issues.apache.org/jira/browse/AVRO-532 |
70794 | No Perforce job exists for this issue. | 0 | 42152 | 9 years, 48 weeks ago | 0|i07kw7: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-759 | Stop accepting connections when close to file descriptor limit |
Improvement | Open | Major | Unresolved | Unassigned | Travis Crawford | Travis Crawford | 29/Apr/10 14:02 | 05/Feb/20 07:16 | 3.7.0, 3.5.8 | server | 0 | 6 | Zookeeper always tries to accept new connections, throwing an exception if out of file descriptors. An improvement would be denying new client connections when close to the limit. Additionally, file-descriptor limits+usage should be exported to the monitoring four-letter word, should that get implemented (see ZOOKEEPER-744). DETAILS A Zookeeper ensemble I administer recently suffered an outage when one node was restarted with the low system-default ulimit of 1024 file descriptors and later ran out. File descriptor usage+max are already being monitored by the following MBeans: - java.lang.OperatingSystem.MaxFileDescriptorCount - java.lang.OperatingSystem.OpenFileDescriptorCount They're described (rather tersely) at: http://java.sun.com/javase/6/docs/jre/api/management/extension/com/sun/management/UnixOperatingSystemMXBean.html This feature request is for the following: (a) Stop accepting new connections when OpenFileDescriptorCount is close to MaxFileDescriptorCount, defaulting to 95% FD usage. New connections should be denied, logged to disk at debug level, and increment a ``ConnectionDeniedCount`` MBean counter. (b) Begin accepting new connections when usage drops below some configurable threshold, defaulting to 90% of FD usage, basically the high/low watermark model. (c) Update the administrators guide with a comment about using an appropriate FD limit. (d) Extra credit: if ZOOKEEPER-744 is implemented export statistics for: zookeeper_open_file_descriptor_count zookeeper_max_file_descriptor_count zookeeper_max_file_descriptor_mismatch - boolean, exported by leader, if not all zk's have the same max FD value |
70780 | No Perforce job exists for this issue. | 0 | 42153 | 9 years, 43 weeks, 6 days ago | 0|i07kwf: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-758 | zkpython segfaults on invalid acl with missing key |
Bug | Closed | Major | Fixed | Kapil Thangavelu | Kapil Thangavelu | Kapil Thangavelu | 28/Apr/10 21:28 | 23/Nov/11 14:22 | 30/Apr/10 21:14 | 3.3.0, 3.4.0 | 3.3.1, 3.4.0 | contrib-bindings | 0 | 1 | ubuntu lucid (10.04) | Currently when setting an acl, there is a minimal parse to ensure that its a list of dicts, however if one of the dicts is missing a required key, the subsequent usage doesn't check for it, and will segfault.. for example using an acl of [{"schema":id, "id":world, permissions:PERM_ALL}] will segfault if used, because the scheme key is missing (its been purposefully typo'd to schema in example). I've expanded the check_acl macro to include verifying that all keys are present and added some unit tests against trunk in the attachments. |
40905 | No Perforce job exists for this issue. | 3 | 32885 | 9 years, 47 weeks, 5 days ago |
Reviewed
|
0|i05zpr: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-757 | zkpython acl/auth usage needs documentation + unit test |
Bug | Open | Major | Unresolved | Unassigned | Kapil Thangavelu | Kapil Thangavelu | 28/Apr/10 20:13 | 28/Apr/10 20:36 | 3.3.0, 3.4.0 | contrib-bindings, documentation | 0 | 1 | ubuntu karmic / lucid ... sun jdk 1.6.0_20 | The zookeeper digest authentication and acl scheme needs a bit more documentation. Currently its documented in the programmer guide. """ digest uses a username:password string to generate MD5 hash which is then used as an ACL ID identity. Authentication is done by sending the username:password in clear text. When used in the ACL the expression will be the username:base64 encoded SHA1 password digest. """ however its actually the digest of the entire credential that needs to be used. I've attached a python unit test that sets and verifies an acl on a node. |
214166 | No Perforce job exists for this issue. | 1 | 32886 | 9 years, 48 weeks, 1 day ago | 0|i05zpz: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-756 | some cleanup and improvements for zooinspector |
Improvement | Resolved | Major | Fixed | Thomas Koch | Thomas Koch | Thomas Koch | 28/Apr/10 03:03 | 15/Dec/11 06:58 | 14/Dec/11 19:06 | 3.3.0 | 3.5.0 | contrib | 0 | 1 | Copied from the already closed ZOOKEEPER-678: * specify the exact URL, where the icons are from. It's best to include the link also in the NOTICE.txt file. It seems, that zooinspector finds it's icons only if the icons folder is in the current path. But when I install zooinspector as part of the Zookeeper Debian package, I want to be able to call it regardless of the current path. Could you use getRessources or something so that I can point to the icons location from the wrapper shell script? Can I place the zooinspector config files in /etc/zookeeper/zooinspector/ ? Could I give zooinspector a property to point to the config file location? There are several places, where viewers is missspelled as "Veiwers". Please do a case insensitive search for "veiw" to correct these. Even the config file "defaultNodeVeiwers.cfg" is missspelled like this. This has the potential to confuse the hell out of people when debugging something! |
zooinspector | 42 | No Perforce job exists for this issue. | 9 | 33397 | 8 years, 15 weeks ago |
Reviewed
|
zooinspector | 0|i062vj: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-755 | Improve c client documentation to reflect that zookeeper_init() creates its own copy of list of host. |
Improvement | Open | Major | Unresolved | Mahadev Konar | Mahadev Konar | Mahadev Konar | 27/Apr/10 16:09 | 05/Feb/20 07:16 | 3.7.0, 3.5.8 | c client | 0 | 0 | The zookeeper.h file does not mention if zookeeper_init() creates its own copy of host string or not. We need to clarify that in the documentation. | 70734 | No Perforce job exists for this issue. | 0 | 42154 | 9 years, 48 weeks, 2 days ago | 0|i07kwn: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-754 | numerous misspellings "succesfully" |
Task | Closed | Major | Fixed | Andrei Savu | Thomas Koch | Thomas Koch | 27/Apr/10 08:57 | 23/Nov/11 14:22 | 27/Apr/10 18:47 | 3.3.0 | 3.3.1, 3.4.0 | contrib-bindings, documentation | 0 | 0 | When testing the debian package of zookeeper with the standard tool "lintian", it fills my screen with complains about the misspelling of "succesfully" in several places of the zkpython contrib. Please be so kind to correct this, when you touch the code the next time. Thanks! | 47617 | No Perforce job exists for this issue. | 2 | 33398 | 9 years, 48 weeks, 2 days ago | fixed numerous misspellings of "succesfully" in the c client and python bindings |
Reviewed
|
0|i062vr: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-753 | update log4j dependency from 1.2.15 to 1.2.16 in branch 3.4 |
Bug | Closed | Major | Fixed | Sean Busbey | Karthik K | Karthik K | 26/Apr/10 02:31 | 13/Mar/14 14:16 | 12/Dec/12 02:41 | 3.4.5 | 3.4.6 | 2 | 4 | http://repo2.maven.org/maven2/org/apache/hadoop/zookeeper/3.3.0/zookeeper-3.3.0.pom The pom contains log4j dependency as itself. <dependency> <groupId>log4j</groupId> <artifactId>log4j</artifactId> <version>1.2.15</version> <scope>compile</scope> </dependency> This is broken without an exclusion list, since the pending dependencies of javax.mail. etc. are not necessary for the most part. Please fix this along with 3.3.1 and republish new dependencies , since at its current state , it is usable by some projects (to host in central , say). Correct dependency for log4j: <dependency> <groupId>log4j</groupId> <artifactId>log4j</artifactId> <version>1.2.15</version> <scope>compile</scope> <exclusions> <exclusion> <groupId>javax.mail</groupId> <artifactId>mail</artifactId> </exclusion> <exclusion> <groupId>javax.jms</groupId> <artifactId>jms</artifactId> </exclusion> <exclusion> <groupId>com.sun.jdmk</groupId> <artifactId>jmxtools</artifactId> </exclusion> <exclusion> <groupId>com.sun.jmx</groupId> <artifactId>jmxri</artifactId> </exclusion> </exclusions> </dependency> |
ivy | 65458 | No Perforce job exists for this issue. | 2 | 2350 | 6 years, 2 weeks ago |
Reviewed
|
0|i00r9r: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-752 | address use of "recoverable" vs "revocable" in lock recipes documentation |
Bug | Resolved | Major | Duplicate | Unassigned | Patrick D. Hunt | Patrick D. Hunt | 22/Apr/10 16:36 | 04/May/14 08:18 | 04/May/14 08:18 | 3.3.0 | 3.5.0 | documentation | 0 | 1 | ZOOKEEPER-751 | http://hadoop.apache.org/zookeeper/docs/r3.3.0/recipes.html#sc_recoverableSharedLocks uses the heading "recoverable" locks, but the text refers to "revocable". |
70765 | No Perforce job exists for this issue. | 0 | 32887 | 5 years, 46 weeks, 4 days ago | 0|i05zq7: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-751 | Recipe heading refers to 'recoverable' but should be 'revocable' |
Improvement | Resolved | Minor | Fixed | Michi Mutsuzaki | Adam Rosien | Adam Rosien | 22/Apr/10 16:34 | 05/May/14 06:55 | 04/May/14 08:39 | 3.3.0 | 3.5.0 | documentation | 0 | 3 | ZOOKEEPER-752 | http://hadoop.apache.org/zookeeper/docs/r3.3.0/recipes.html#sc_recoverableSharedLocks uses the heading "recoverable" locks, but the text refers to "revocable". | 214165 | No Perforce job exists for this issue. | 1 | 42155 | 5 years, 46 weeks, 3 days ago | documentation recipe | 0|i07kwv: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-750 | move maven artifacts into "dist-maven" subdir of the release (package target) |
Bug | Closed | Major | Fixed | Patrick D. Hunt | Patrick D. Hunt | Patrick D. Hunt | 22/Apr/10 12:45 | 23/Nov/11 14:22 | 28/Apr/10 22:26 | 3.3.0 | 3.3.1, 3.4.0 | build | 0 | 1 | ZOOKEEPER-749 | The maven artifacts are currently (3.3.0) put into the toplevel of the release. This causes confusion amonst new users (ie "which jar do I use?"). Also the naming of the bin jar is wrong for maven (to put onto the maven repo it must be named without the -bin) which adds extra burden for the release manager. Putting into a subdir fixes this and makes it explicit what's being deployed to maven repo. |
47618 | No Perforce job exists for this issue. | 0 | 32888 | 9 years, 48 weeks ago | 0|i05zqf: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-749 | OSGi metadata not included in binary only jar |
Bug | Closed | Critical | Fixed | Patrick D. Hunt | Patrick D. Hunt | Patrick D. Hunt | 22/Apr/10 12:21 | 23/Nov/11 14:22 | 28/Apr/10 22:02 | 3.3.0 | 3.3.1, 3.4.0 | build | 0 | 2 | ZOOKEEPER-750, ZOOKEEPER-425 | See this JIRA/comment for background: https://issues.apache.org/jira/browse/ZOOKEEPER-425?focusedCommentId=12859697&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12859697 basically the issue is that OSGi metadata is included in the legacy jar (zookeeper-<version>.jar) but not in the binary only jar (zookeeper-<version>-bin.jar) which is eventually deployed to the maven repo. |
47619 | No Perforce job exists for this issue. | 1 | 32889 | 9 years, 48 weeks ago |
Reviewed
|
0|i05zqn: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-748 | zkPython's NodeExistsException should include information about the node that exists |
Improvement | Open | Major | Unresolved | Unassigned | Joseph Koshy | Joseph Koshy | 22/Apr/10 00:58 | 05/Feb/20 07:16 | 3.3.0 | 3.7.0, 3.5.8 | contrib-bindings | 0 | 4 | Currently the code creates a {{zookeeper.NodeExistsException}} object with a string argument "node exists". Including the name of the node that caused the exception would be useful, in that it allows user code like the following: {code:title=example1} try: zookeeper.create(zh, n1, ...) zookeeper.create(zh, n2, ...) except zookeeper.NodeExistsException, n: print "Node \"%s\" exists." % n {code} |
70738 | No Perforce job exists for this issue. | 0 | 42156 | 9 years, 49 weeks ago | 0|i07kx3: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-747 | Add C# generation to Jute |
New Feature | Closed | Major | Fixed | Eric Hauser | Eric Hauser | Eric Hauser | 21/Apr/10 23:17 | 30/Jan/12 12:25 | 03/May/10 17:50 | 3.4.0 | jute | 0 | 5 | The following patch adds a new language, C#, to the Jute code generation. The code that is generated does have a dependency on a third party library, Jon Skeet's MiscUtil, which is Apache licensed. The library is necessary because C# does not provide big endian support in the base class libraries. As none of the existing Jute code has any unit tests, I have not added tests for this patch. |
47620 | No Perforce job exists for this issue. | 1 | 33399 | 8 years, 8 weeks, 3 days ago |
Reviewed
|
0|i062vz: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-746 | learner outputs session id to log in dec (should be hex) |
Bug | Closed | Minor | Fixed | Patrick D. Hunt | Patrick D. Hunt | Patrick D. Hunt | 21/Apr/10 18:10 | 23/Nov/11 14:22 | 26/Apr/10 00:18 | 3.3.1, 3.4.0 | quorum, server | 0 | 1 | usability issue, should be in hex: 2010-04-21 11:31:13,827 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11354:Learner@95] - Revalidating client: 83353578391797760 |
47621 | No Perforce job exists for this issue. | 1 | 32890 | 9 years, 48 weeks, 3 days ago |
Reviewed
|
0|i05zqv: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-745 | zkpython documentation |
Task | Open | Major | Unresolved | Unassigned | Henry Robinson | Henry Robinson | 21/Apr/10 13:35 | 21/Apr/10 13:35 | 0 | 0 | zkpython deserves better documentation than the README I have given it. This jira is for tracking a document that includes at a minimum: 1. Installation instructions 2. Basic usage instructions, including common idiomatic use 3. API reference |
214164 | No Perforce job exists for this issue. | 0 | 42157 | 9 years, 49 weeks, 1 day ago | 0|i07kxb: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-744 | Add monitoring four-letter word |
New Feature | Closed | Major | Fixed | Andrei Savu | Travis Crawford | Travis Crawford | 19/Apr/10 16:14 | 23/Nov/11 14:22 | 05/Jul/10 14:22 | 3.4.0 | 3.4.0 | server | 0 | 3 | ZOOKEEPER-799, ZOOKEEPER-613, ZOOKEEPER-701 | Filing a feature request based on a zookeeper-user discussion. Zookeeper should have a new four-letter word that returns key-value pairs appropriate for importing to a monitoring system (such as Ganglia which has a large installed base) This command should initially export the following: (a) Count of instances in the ensemble. (b) Count of up-to-date instances in the ensemble. But be designed such that in the future additional data can be added. For example, the output could define the statistic in a comment, then print a key "space character" value line: """ # Total number of instances in the ensemble zk_ensemble_instances_total 5 # Number of instances currently participating in the quorum. zk_ensemble_instances_active 4 """ From the mailing list: """ Date: Mon, 19 Apr 2010 12:10:44 -0700 From: Patrick Hunt <phunt@apache.org> To: zookeeper-user@hadoop.apache.org Subject: Re: Recovery issue - how to debug? On 04/19/2010 11:55 AM, Travis Crawford wrote: > It would be a lot easier from the operations perspective if the leader > explicitly published some health stats: > > (a) Count of instances in the ensemble. > (b) Count of up-to-date instances in the ensemble. > > This would greatly simplify monitoring& alerting - when an instance > falls behind one could configure their monitoring system to let > someone know and take a look at the logs. That's a great idea. Please enter a JIRA for this - a new 4 letter word and JMX support. It would also be a great starter project for someone interested in becoming more familiar with the server code. Patrick """ |
47622 | No Perforce job exists for this issue. | 5 | 33400 | 9 years, 38 weeks, 2 days ago | Added new 4letter word for monitoring: "mntr" The output is compatible with the Java properties format.Your script should expect content changes: new keys could be added in the future. |
Reviewed
|
zookeeper monitoring | 0|i062w7: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-743 | Diagram error on Zookeeper internals page |
Bug | Open | Trivial | Unresolved | Unassigned | Ivan Kelly | Ivan Kelly | 16/Apr/10 10:20 | 16/Apr/10 10:20 | 0 | 0 | http://hadoop.apache.org/zookeeper/docs/r3.1.2/zookeeperInternals.html In the active messaging diagram, one of the commit arrows is going the wrong way. |
214163 | No Perforce job exists for this issue. | 0 | 32891 | 9 years, 49 weeks, 6 days ago | 0|i05zr3: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-742 | Deallocatng None on writes |
Bug | Closed | Major | Fixed | Henry Robinson | Josh Fraser | Josh Fraser | 15/Apr/10 19:21 | 23/Nov/11 14:22 | 22/Apr/10 02:44 | 3.2.2, 3.3.0 | 3.3.1, 3.4.0 | c client, contrib, contrib-bindings | 0 | 1 | ZOOKEEPER-631 | Redhat Enterprise 5.4 (python 2.4.3), Mac OS X 10.5.8 (python 2.5.1) | On write operations, getting: Fatal Python error: deallocating None Aborted This error happens on write operations only. Here's the backtrace: Fatal Python error: deallocating None Program received signal SIGABRT, Aborted. 0x000000383fc30215 in raise () from /lib64/libc.so.6 (gdb) bt #0 0x000000383fc30215 in raise () from /lib64/libc.so.6 #1 0x000000383fc31cc0 in abort () from /lib64/libc.so.6 #2 0x00002adbd0be8189 in Py_FatalError () from /usr/lib64/libpython2.4.so.1.0 #3 0x00002adbd0bc7493 in PyEval_EvalFrame () from /usr/lib64/libpython2.4.so.1.0 #4 0x00002adbd0bcab66 in PyEval_EvalFrame () from /usr/lib64/libpython2.4.so.1.0 #5 0x00002adbd0bcbfe5 in PyEval_EvalCodeEx () from /usr/lib64/libpython2.4.so.1.0 #6 0x00002adbd0bcc032 in PyEval_EvalCode () from /usr/lib64/libpython2.4.so.1.0 #7 0x00002adbd0be8729 in ?? () from /usr/lib64/libpython2.4.so.1.0 #8 0x00002adbd0be9bd8 in PyRun_SimpleFileExFlags () from /usr/lib64/libpython2.4.so.1.0 #9 0x00002adbd0bf000d in Py_Main () from /usr/lib64/libpython2.4.so.1.0 #10 0x000000383fc1d974 in __libc_start_main () from /lib64/libc.so.6 #11 0x0000000000400629 in _start () |
47623 | No Perforce job exists for this issue. | 4 | 32892 | 9 years, 49 weeks ago | 0|i05zrb: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-741 | root level create on REST proxy fails |
Bug | Closed | Critical | Fixed | Patrick D. Hunt | Patrick D. Hunt | Patrick D. Hunt | 14/Apr/10 13:12 | 23/Nov/11 14:22 | 22/Apr/10 01:43 | 3.3.0 | 3.3.1, 3.4.0 | contrib | 0 | 1 | Create /foo using the REST proxy fails. Also upgrade to the latest Jersey/Grizzly while we are at it (fixes for func/security) |
47624 | No Perforce job exists for this issue. | 1 | 32893 | 9 years, 49 weeks ago |
Reviewed
|
0|i05zrj: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-740 | zkpython leading to segfault on zookeeper |
Bug | Resolved | Major | Fixed | Henry Robinson | Federico | Federico | 13/Apr/10 04:08 | 24/Apr/14 21:45 | 24/Apr/14 21:45 | 3.3.0 | 0 | 5 | ZOOKEEPER-670, ZOOKEEPER-888, ZOOKEEPER-631 | The program that we are implementing uses the python binding for zookeeper but sometimes it crash with segfault; here is the bt from gdb: Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0xad244b70 (LWP 28216)] 0x080611d5 in PyObject_Call (func=0x862fab0, arg=0x8837194, kw=0x0) at ../Objects/abstract.c:2488 2488 ../Objects/abstract.c: No such file or directory. in ../Objects/abstract.c (gdb) bt #0 0x080611d5 in PyObject_Call (func=0x862fab0, arg=0x8837194, kw=0x0) at ../Objects/abstract.c:2488 #1 0x080d6ef2 in PyEval_CallObjectWithKeywords (func=0x862fab0, arg=0x8837194, kw=0x0) at ../Python/ceval.c:3575 #2 0x080612a0 in PyObject_CallObject (o=0x862fab0, a=0x8837194) at ../Objects/abstract.c:2480 #3 0x0047af42 in watcher_dispatch (zzh=0x86174e0, type=-1, state=1, path=0x86337c8 "", context=0x8588660) at src/c/zookeeper.c:314 #4 0x00496559 in do_foreach_watcher (zh=0x86174e0, type=-1, state=1, path=0x86337c8 "", list=0xa5354140) at src/zk_hashtable.c:275 #5 deliverWatchers (zh=0x86174e0, type=-1, state=1, path=0x86337c8 "", list=0xa5354140) at src/zk_hashtable.c:317 #6 0x0048ae3c in process_completions (zh=0x86174e0) at src/zookeeper.c:1766 #7 0x0049706b in do_completion (v=0x86174e0) at src/mt_adaptor.c:333 #8 0x0013380e in start_thread () from /lib/tls/i686/cmov/libpthread.so.0 #9 0x002578de in clone () from /lib/tls/i686/cmov/libc.so.6 |
59683 | No Perforce job exists for this issue. | 1 | 32894 | 5 years, 47 weeks, 6 days ago | 0|i05zrr: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-739 | use a simple command-line parsing library for flexibility in command-line arguments |
Improvement | Open | Major | Unresolved | Unassigned | Karthik K | Karthik K | 12/Apr/10 19:10 | 12/Apr/10 19:10 | 0 | 2 | JOpt is being used by HBase team and very light-weight. http://jopt-simple.sourceforge.net/examples.html <jopt.version>3.2</jopt.version> mvn artifacts are available in public repositories, so integrating with ivy should not be an issue either. Check if that makes sense. |
214162 | No Perforce job exists for this issue. | 0 | 42158 | 9 years, 50 weeks, 3 days ago | 0|i07kxj: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-738 | zookeeper.jute.h fails to compile with -pedantic |
Bug | Closed | Major | Fixed | Jozef Hatala | Patrick D. Hunt | Patrick D. Hunt | 10/Apr/10 17:17 | 23/Nov/11 14:22 | 26/Apr/10 14:58 | 3.3.0 | 3.3.1, 3.4.0 | c client | 0 | 1 | /home/y/include/zookeeper/zookeeper.jute.h:96: error: extra semicolon /home/y/include/zookeeper/zookeeper.jute.h:158: error: extra semicolon /home/y/include/zookeeper/zookeeper.jute.h:288: error: extra semicolon the code generator needs to be updated to not output a naked semi |
47625 | No Perforce job exists for this issue. | 1 | 32895 | 9 years, 48 weeks, 3 days ago |
Reviewed
|
0|i05zrz: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-737 | some 4 letter words may fail with netcat (nc) |
Bug | Closed | Blocker | Fixed | Mahadev Konar | Patrick D. Hunt | Patrick D. Hunt | 10/Apr/10 17:13 | 29/Dec/11 18:08 | 04/May/10 17:51 | 3.3.0 | 3.3.1, 3.4.0 | server | 0 | 3 | ZOOKEEPER-805, ZOOKEEPER-1346 | nc closes the write channel as soon as it's sent it's information, for example "echo stat|nc localhost 2181" in general this is fine, however the server code will close the socket as soon as it receives notice that nc has closed it's write channel. if not all the 4 letter word result has been written back to the client yet, this will cause some or all of the result to be lost - ie the client will not see the full result. this was introduced in 3.3.0 as part of a change to reduce blocking of the selector by long running 4letter words. here's an example of the logs from the server during this echo -n stat | nc localhost 2181 2010-04-09 21:55:36,124 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn$Factory@251] - Accepted socket connection from /127.0.0.1:42179 2010-04-09 21:55:36,124 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@968] - Processing stat command from /127.0.0.1:42179 2010-04-09 21:55:36,125 - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@606] - EndOfStreamException: Unable to read additional data from client sessionid 0x0, likely client has closed socket 2010-04-09 21:55:36,125 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1286] - Closed socket connection for client /127.0.0.1:42179 (no session established for client) [phunt@gsbl90850 zookeeper-3.3.0]$ 2010-04-09 21:55:36,126 - ERROR [Thread-15:NIOServerCnxn@422] - Unexpected Exception: java.nio.channels.CancelledKeyException at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:55) at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:59) at org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:395) at org.apache.zookeeper.server.NIOServerCnxn$SendBufferWriter.checkFlush(NIOServerCnxn.java:907) at org.apache.zookeeper.server.NIOServerCnxn$SendBufferWriter.flush(NIOServerCnxn.java:945) at java.io.BufferedWriter.flush(BufferedWriter.java:236) at java.io.PrintWriter.flush(PrintWriter.java:276) at org.apache.zookeeper.server.NIOServerCnxn$2.run(NIOServerCnxn.java:1089) 2010-04-09 21:55:36,126 - ERROR [Thread-15:NIOServerCnxn$Factory$1@82] - Thread Thread[Thread-15,5,main] died java.nio.channels.CancelledKeyException at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:55) at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:64) at org.apache.zookeeper.server.NIOServerCnxn$SendBufferWriter.wakeup(NIOServerCnxn.java:927) at org.apache.zookeeper.server.NIOServerCnxn$SendBufferWriter.checkFlush(NIOServerCnxn.java:909) at org.apache.zookeeper.server.NIOServerCnxn$SendBufferWriter.flush(NIOServerCnxn.java:945) at java.io.BufferedWriter.flush(BufferedWriter.java:236) at java.io.PrintWriter.flush(PrintWriter.java:276) at org.apache.zookeeper.server.NIOServerCnxn$2.run(NIOServerCnxn.java:1089) |
47626 | No Perforce job exists for this issue. | 7 | 32896 | 8 years, 24 weeks, 1 day ago |
Reviewed
|
0|i05zs7: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-736 | docs for server config options should specify which are required and which have defaults |
Bug | Open | Major | Unresolved | Unassigned | Patrick D. Hunt | Patrick D. Hunt | 08/Apr/10 19:17 | 05/Feb/20 07:16 | 3.3.0 | 3.7.0, 3.5.8 | documentation | 0 | 0 | the docs (admin) should do a better job specifying which config parameters are required and the defaults if any. initLimit/syncLimit are both examples where we don't do this |
66540 | No Perforce job exists for this issue. | 0 | 32897 | 9 years, 51 weeks ago | 0|i05zsf: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-735 | cppunit test testipv6 assumes that the machine is ipv6 enabled. |
Bug | Closed | Major | Fixed | Mahadev Konar | Mahadev Konar | Mahadev Konar | 07/Apr/10 17:26 | 23/Nov/11 14:22 | 08/Apr/10 14:32 | 3.3.1, 3.4.0 | tests | 0 | 1 | The test should be fixed so that it runs only if ipv6 is enabled and does not run if ipv6 is not enabled. | 47627 | No Perforce job exists for this issue. | 1 | 32898 | 9 years, 51 weeks ago |
Reviewed
|
0|i05zsn: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-734 | QuorumPeerTestBase.java and ZooKeeperServerMainTest.java do not handle windows path correctly |
Bug | Closed | Major | Fixed | Vishal Kher | Vishal Kher | Vishal Kher | 06/Apr/10 18:43 | 23/Nov/11 14:22 | 26/Apr/10 16:02 | 3.3.0 | 3.3.1, 3.4.0 | tests | 0 | 1 | Windows 32-bit | While runniing "ant test-core-java" QuorumPeerTestBase.java and ZooKeeperServerMainTest.java fail. The problem seems to be in ZookeeperserverMainTest.java:MainThread():66 and in QuorumPeerBaseTest.java:MainThread:76. FileWriter.write() writes windows path to the conf file. Java does not like windows path. Therefore, the test complains that it cannot find myid and fails. Solution - convert windows path to UNIX path. This worked for me on windows. Diffs are attached below. Solution not tested on Linux since for some reason build is failing (due to problems not related to this change). vmc-floorb-dhcp116-114:/opt/zksrc/zookeeper-3.3.0/src/java/test/org/apache/zookeeper/server # svn diff Index: ZooKeeperServerMainTest.java =================================================================== --- ZooKeeperServerMainTest.java (revision 931240) +++ ZooKeeperServerMainTest.java (working copy) @@ -61,7 +61,8 @@ if (!dataDir.mkdir()) { throw new IOException("unable to mkdir " + dataDir); } - fwriter.write("dataDir=" + dataDir.toString() + "\n"); + String data = dataDir.toString().replace('\\', '/'); + fwriter.write("dataDir=" + data + "\n"); fwriter.write("clientPort=" + clientPort + "\n"); fwriter.flush(); Index: quorum/QuorumPeerTestBase.java =================================================================== --- quorum/QuorumPeerTestBase.java (revision 931240) +++ quorum/QuorumPeerTestBase.java (working copy) @@ -73,7 +73,8 @@ if (!dataDir.mkdir()) { throw new IOException("Unable to mkdir " + dataDir); } - fwriter.write("dataDir=" + dataDir.toString() + "\n"); + String data = dataDir.toString().replace('\\', '/'); + fwriter.write("dataDir=" + data + "\n"); fwriter.write("clientPort=" + clientPort + "\n"); fwriter.write(quorumCfgSection + "\n"); |
47628 | No Perforce job exists for this issue. | 1 | 32899 | 9 years, 48 weeks, 3 days ago |
Reviewed
|
0|i05zsv: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-733 | use netty to handle client connections |
Improvement | Closed | Major | Fixed | Patrick D. Hunt | Benjamin Reed | Benjamin Reed | 05/Apr/10 10:44 | 24/Mar/17 13:17 | 18/Aug/10 02:25 | 3.4.0 | server | 0 | 4 | ZOOKEEPER-2737, ZOOKEEPER-823, AVRO-405, ZOOKEEPER-845 | we currently have our own asynchronous NIO socket engine to be able to handle lots of clients with a single thread. over time the engine has become more complicated. we would also like the engine to use multiple threads on machines with lots of cores. plus, we would like to be able to support things like SSL. if we switch to netty, we can simplify our code and get the previously mentioned benefits. | 47629 | No Perforce job exists for this issue. | 13 | 33401 | 9 years, 28 weeks, 1 day ago |
Reviewed
|
0|i062wf: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-732 | Improper translation of error into Python exception |
Bug | Closed | Minor | Fixed | Lei Zhang | Gustavo Niemeyer | Gustavo Niemeyer | 29/Mar/10 18:48 | 13/Mar/14 14:17 | 03/Oct/13 17:51 | 3.3.0 | 3.4.6, 3.5.0 | contrib-bindings | 0 | 6 | Apparently errors returned by the C library are not being correctly converted into a Python exception in some cases: >>> zookeeper.get_children(0, "/", None) Traceback (most recent call last): File "<stdin>", line 1, in <module> SystemError: error return without exception set |
47630 | No Perforce job exists for this issue. | 4 | 32900 | 6 years, 2 weeks ago | Client that uses python binding may receive SystemError on session expiration. |
Reviewed
|
0|i05zt3: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-731 | Zookeeper#delete , #create - async versions miss a verb in the javadoc |
Bug | Closed | Minor | Fixed | Thomas Koch | Karthik K | Karthik K | 26/Mar/10 19:12 | 23/Nov/11 14:22 | 06/Sep/11 10:12 | 3.3.0 | 3.4.0 | documentation | 0 | 2 | /** * The Asynchronous version of delete. "The request doesn't *missing* actually until * the asynchronous callback is called." */ public void delete(final String path, int version, VoidCallback cb, Object ctx) .. Also some information in the javadoc about how to instantiate the callback objects / context would be useful . |
47631 | No Perforce job exists for this issue. | 3 | 32901 | 8 years, 29 weeks, 2 days ago | 0|i05ztb: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-730 | C cli: Add a command to recursively delete a znode |
New Feature | Open | Major | Unresolved | Unassigned | Karthik K | Karthik K | 26/Mar/10 18:38 | 14/Dec/19 06:08 | 3.7.0 | c client | 0 | 4 | ZOOKEEPER-729 | ZOOKEEPER-729 talks about recursively deleting a znode in java. Once the review is complete and frozen, equivalent functionality need to be available in C client as well. Tracker jira for the same. |
214161 | No Perforce job exists for this issue. | 0 | 42159 | 5 years, 47 weeks, 6 days ago | 0|i07kxr: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ZooKeeper | ZOOKEEPER-729 | Recursively delete a znode - zkCli.sh rmr /node |
New Feature | Closed | Major | Fixed | Karthik K | Karthik K | Karthik K | 26/Mar/10 15:30 | 28/Dec/11 11:13 | 15/Apr/10 03:26 | 3.4.0 | java client | 0 | 2 | ZOOKEEPER-730, ZOOKEEPER-1326 | Recursively delete a given znode in zookeeper, from the command-line. New operation "rmr" added to zkclient. $ ./zkCli.sh rmr /node |
47632 | No Perforce job exists for this issue. | 5 | 33402 | 8 years, 13 weeks, 1 day ago |
Reviewed
|
0|i062wn: |
| Generated at Fri Mar 20 00:35:58 UTC 2020 by Song Xu using Jira 8.3.4#803005-sha1:1f96e09b3c60279a408a2ae47be3c745f571388b. |